Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

Rainer Winnenburg; Thomas Wächter; Conrad Plake; Andreas Doms; Michael Schroeder

doi:10.1093/bib/bbn043

Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?

Research output: Contribution to journal › Review article › Contributed › peer-review

Contributors

Rainer Winnenburg - , Biotechnology Center (BIOTEC) (Author)
Thomas Wächter - (Author)
Conrad Plake - (Author)
Andreas Doms - (Author)
Michael Schroeder - , Chair of Bioinformatics (Author)

Abstract

The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.

Details

Original language	English
Pages (from-to)	466-78
Number of pages	13
Journal	Briefings in bioinformatics
Volume	9
Issue number	6
Publication status	Published - Nov 2008
Peer-reviewed	Yes

External IDs

Scopus	58149375832
ORCID	/0000-0003-2848-6949/work/141543396

Keywords

Abstracting and Indexing, Animals, Computational Biology/methods, Databases, Bibliographic, Databases, Genetic, Genes, Humans, Information Storage and Retrieval/methods, Knowledge, Semantics

Library keywords

570 Biology

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

Keywords

Library keywords