SyMetrics: an integrated machine learning model for evaluating the pathogenicity of synonymous variants in the human genome

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Linnaeus Bundalian - , Institut für Klinische Genetik, Nationales Centrum für Tumorerkrankungen Dresden, Universitätsklinikum Leipzig (Autor:in)
  • Martina Schmidt Strnadová - , Universität Leipzig (Autor:in)
  • Felix Garten - , Universitätsklinikum Leipzig (Autor:in)
  • Susanne Horn - , Universität Leipzig (Autor:in)
  • Udo Stenzel - , Universität Leipzig (Autor:in)
  • Denny Popp - , Universitätsklinikum Leipzig (Autor:in)
  • Johannes R Lemke - , Universitätsklinikum Leipzig (Autor:in)
  • Saskia Biskup - , CeGaT GmbH (Autor:in)
  • Björn Schulte - , CeGaT GmbH (Autor:in)
  • Patrick May - , University of Luxembourg (Autor:in)
  • Frank Bösebeck - , Agaplesion Diakonieklinikum Rotenburg (Autor:in)
  • Antje Garten - , Universitätsklinikum Leipzig (Autor:in)
  • Doreen Thor - , Universität Leipzig (Autor:in)
  • Angela Schulz - , Universität Leipzig (Autor:in)
  • Julia Hentschel - , Universitätsklinikum Leipzig (Autor:in)
  • Janet Kelso - , Max Planck Institute for Evolutionary Anthropology (Autor:in)
  • Torsten Schöneberg - , University of Global Health Equity (Autor:in)
  • Diana Le Duc - , Institut für Klinische Genetik, Nationales Centrum für Tumorerkrankungen Dresden, Universitätsklinikum Leipzig , Max Planck Institute for Evolutionary Anthropology, Klinikum Chemnitz gGmbH (Autor:in)

Abstract

Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from benign variants, with a ROC-AUC of 0.89, outperforming individual predictors. Our estimates indicate that about 1.98 ± 0.17% of sSNVs absent from population databases are damaging (roughly 900 000 sSNVs), with an odds ratio of 3.87 for deleteriousness compared to common sSNVs (P < 0.05). To validate predictions, we performed functional assays on selected sSNVs in the AVPR2 gene and additionally used available large scale mutagenesis screens of RAD51C and BAP1 variants. In a clinical cohort, we identified 15 predicted deleterious sSNVs in genes linked to patient phenotypes; 9 were classified as (likely) pathogenic while 6 were variants of uncertain significance (VUS) per American College of Medical Genetics guidelines. For three VUS, segregation data supported their suspected inheritance patterns (de novo, X-linked). Our findings underscore the functional importance of sSNVs. To support further research and clinical applications, we provide a Python package and web application (https://symetrics.org/) for evaluating these variants comprehensively.

Details

OriginalspracheEnglisch
Aufsatznummerlqaf211
Fachzeitschrift NAR: genomics and bioinformatics
Jahrgang8
Ausgabenummer1
PublikationsstatusVeröffentlicht - März 2026
Peer-Review-StatusJa

Externe IDs

PubMedCentral PMC12805901
Scopus 105027957751

Schlagworte

Schlagwörter

  • Genome, Human, Humans, Machine Learning, Polymorphism, Single Nucleotide, Silent Mutation, Software