Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

Bill Andreopoulos; Dimitra Alexopoulou; Michael Schroeder

doi:10.1504/ijdmb.2008.020522

Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Bill Andreopoulos - , Biotechnology Center (BIOTEC) (Author)
Dimitra Alexopoulou - , Chair of Molecular Developmental Genetics (Author)
Michael Schroeder - , Chair of Bioinformatics (Author)

Abstract

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as 'development' can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an F-measure of 77%. Additionally, applying document clustering improves precision to 82%. We applied the same approach to disambiguate 'nucleus', 'transport', and 'spindle', and we achieved consistent results. Thus, our method is a viable approach towards the automation of literature-based genome annotation.

Details

Original language	English
Pages (from-to)	193-215
Number of pages	23
Journal	International journal of data mining and bioinformatics
Volume	2
Issue number	3
Publication status	Published - 2008
Peer-reviewed	Yes

External IDs

Scopus	53349143997
ORCID	/0000-0003-2848-6949/work/141543397

Keywords

Artificial Intelligence, Cluster Analysis, Documentation/methods, Information Storage and Retrieval/methods, Natural Language Processing, Semantics, Terminology as Topic

Library keywords

570 Biology

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

Keywords

Library keywords