SetQuence & SetOmic: Deep set transformers for whole genome and exome tumour analysis

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

In oncology, Deep Learning has shown great potential to personalise tasks such as tumour type classification, based on per-patient omics data-sets. Being high dimensional, incorporation of such data in one model is a challenge, often leading to one-dimensional studies and, therefore, information loss. Instead, we first propose relying on non-fixed sets of whole genome or whole exome variant-associated sequences, which can be used for supervised learning of oncology-relevant tasks by our Set Transformer based Deep Neural Network, SetQuence. We optimise this architecture to improve its efficiency. This allows for exploration of not just coding but also non-coding variants, from large datasets. Second, we extend the model to incorporate these representations together with multiple other sources of omics data in a flexible way with SetOmic. Evaluation, using these representations, shows improved robustness and reduced information loss compared to previous approaches, while still being computationally tractable. By means of Explainable Artificial Intelligence methods, our models are able to recapitulate the biological contribution of highly attributed features in the tumours studied. This validation opens the door to novel directions in multi-faceted genome and exome wide biomarker discovery and personalised treatment among other presently clinically relevant tasks.

Details

Original languageEnglish
Article number105095
Number of pages17
JournalBioSystems
Volume235
Early online date6 Dec 2023
Publication statusPublished - Jan 2024
Peer-reviewedYes

External IDs

ORCID /0000-0001-9756-6390/work/148606970
unpaywall 10.1016/j.biosystems.2023.105095
Scopus 85182501564

Keywords

Keywords

  • Artificial Intelligence, Biomedical Research, Exome/genetics, Humans, Medical Oncology, Neoplasms/genetics

Library keywords