Emati: A recommender system for biomedical literature based on supervised learning

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Özge Kart - , TUD Dresden University of Technology, Dokuz Eylul University (Author)
  • Alexandre Mestiashvili - , TUD Dresden University of Technology (Author)
  • Kurt Lachmann - , TUD Dresden University of Technology (Author)
  • Richard Kwasnicki - , TUD Dresden University of Technology (Author)
  • Michael Schroeder - , Chair of Bioinformatics (Author)

Abstract

The scientific literature continues to grow at an ever-increasing rate. Considering that thousands of new articles are published every week, it is obvious how challenging it is to keep up with newly published literature on a regular basis. Using a recommender system that improves the user experience in the online environment can be a solution to this problem. In the present study, we aimed to develop a web-based article recommender service, called Emati. Since the data are text-based by nature and we wanted our system to be independent of the number of users, a content-based approach has been adopted in this study. A supervised machine learning model has been proposed to generate article recommendations. Two different supervised learning approaches, namely the naïve Bayes model with Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and the state-of-The-Art language model bidirectional encoder representations from transformers (BERT), have been implemented. In the first one, a list of documents is converted into TF-IDF-weighted features and fed into a classifier to distinguish relevant articles from irrelevant ones. Multinomial naïve Bayes algorithm is used as a classifier since, along with the class label, it also gives the probability that the input belongs to this class. The second approach is based on fine-Tuning the pretrained state-of-The-Art language model BERT for the text classification task. Emati provides a weekly updated list of article recommendations and presents it to the user, sorted by probability scores. New article recommendations are also sent to users' email addresses on a weekly basis. Additionally, Emati has a personalized search feature to search online services' (such as PubMed and arXiv) content and have the results sorted by the user's classifier. Database URL: https://emati.biotec.tu-dresden.de

Details

Original languageEnglish
Article numberbaac104
Number of pages10
JournalDatabase
Volume2022 (2022)
Publication statusPublished - 9 Dec 2022
Peer-reviewedYes

External IDs

PubMed 36484479
ORCID /0000-0003-2848-6949/work/147140364