Whole exome sequencing and machine learning germline analysis of individuals presenting with extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Ana Patiño-García - , Universidad de Navarra (Autor:in)
  • Elizabeth Guruceaga - , Universidad de Navarra (Autor:in)
  • Maria Pilar Andueza - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Marimar Ocón - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Jafait Junior Fodop Sokoudjou - , Universidad de Navarra (Autor:in)
  • Nicolás de Villalonga Zornoza - , Universidad de Navarra (Autor:in)
  • Gorka Alkorta-Aranburu - , Universidad de Navarra (Autor:in)
  • Ibon Tamayo Uria - , Universidad de Navarra (Autor:in)
  • Alfonso Gurpide - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Carlos Camps - , Hospital General Universitario de Valencia (Autor:in)
  • Eloísa Jantus-Lewintre - , Polytechnic University of Valencia (Autor:in)
  • Maria Navamuel-Andueza - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Miguel F. Sanmamed - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Ignacio Melero - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Mohamed Elgendy - , Institut für Klinische Chemie und Laboratoriumsmedizin, Nationales Centrum für Tumorerkrankungen Dresden, Medizinische Klinik und Poliklinik I, Institute of Molecular Genetics of the Czech Academy of Sciences (Autor:in)
  • Juan Pablo Fusco - , Quirónsalud (Autor:in)
  • Javier J. Zulueta - , Mount Sinai Hospital NY (Autor:in)
  • Juan P. de-Torres - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Gorka Bastarrika - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Luis Seijo - , Clinica Universidad de Navarra (CUN) (Autor:in)
  • Ruben Pio - , Universidad de Navarra (Autor:in)
  • Luis M. Montuenga - , Universidad de Navarra (Autor:in)
  • Mikel Hernáez - , Universidad de Navarra (Autor:in)
  • Idoia Ochoa - , Universidad de Navarra (Autor:in)
  • Jose Luis Perez-Gracia - , Clinica Universidad de Navarra (CUN) (Autor:in)

Abstract

Background: Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at a young age, other heavy smokers never develop it, even at an advanced age, suggesting a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML). Methods: We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age (extreme cases) or who did not develop lung cancer at an advanced age (extreme controls), selected from databases including over 6600 subjects. We selected individual coding genetic variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We validated the results from our discovery cohort, in which we analysed by WES extreme cases and controls presenting similar phenotypes. We developed ML models using both cohorts. Findings: Mean age for extreme cases and controls was 50.7 and 79.1 years respectively, and mean tobacco consumption was 34.6 and 62.3 pack-years. We validated 16 individual variants and 33 variant-rich genes. The gene harbouring the most validated variants was HLA-A in extreme controls (4 variants in the discovery cohort, p = 3.46E-07; and 4 in the validation cohort, p = 1.67E-06). We trained ML models using as input the 16 individual variants in the discovery cohort and tested them on the validation cohort, obtaining an accuracy of 76.5% and an AUC-ROC of 83.6%. Functions of validated genes included candidate oncogenes, tumour-suppressors, DNA repair, HLA-mediated antigen presentation and regulation of proliferation, apoptosis, inflammation and immune response. Interpretation: Individuals presenting extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma show different germline profiles. Our strategy may allow the identification of high-risk subjects and the development of new therapeutic approaches. Funding: See a detailed list of funding bodies in the Acknowledgements section at the end of the manuscript.

Details

OriginalspracheEnglisch
Aufsatznummer105048
FachzeitschriftEBioMedicine
Jahrgang102
PublikationsstatusVeröffentlicht - Apr. 2024
Peer-Review-StatusJa

Externe IDs

PubMed 38484556

Schlagworte

Ziele für nachhaltige Entwicklung

Schlagwörter

  • Cancer risk, Extreme phenotypes, Lung adenocarcinoma, Tobacco, Whole exome sequencing