SetQuence & SetOmic: Deep set transformers for whole genome and exome tumour analysis

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung


In oncology, Deep Learning has shown great potential to personalise tasks such as tumour type classification, based on per-patient omics data-sets. Being high dimensional, incorporation of such data in one model is a challenge, often leading to one-dimensional studies and, therefore, information loss. Instead, we first propose relying on non-fixed sets of whole genome or whole exome variant-associated sequences, which can be used for supervised learning of oncology-relevant tasks by our Set Transformer based Deep Neural Network, SetQuence. We optimise this architecture to improve its efficiency. This allows for exploration of not just coding but also non-coding variants, from large datasets. Second, we extend the model to incorporate these representations together with multiple other sources of omics data in a flexible way with SetOmic. Evaluation, using these representations, shows improved robustness and reduced information loss compared to previous approaches, while still being computationally tractable. By means of Explainable Artificial Intelligence methods, our models are able to recapitulate the biological contribution of highly attributed features in the tumours studied. This validation opens the door to novel directions in multi-faceted genome and exome wide biomarker discovery and personalised treatment among other presently clinically relevant tasks.


PublikationsstatusElektronische Veröffentlichung vor Drucklegung - 6 Dez. 2023

Externe IDs

ORCID /0000-0001-9756-6390/work/148606970
unpaywall 10.1016/j.biosystems.2023.105095
Scopus 85182501564