SyMetrics: an integrated machine learning model for evaluating the pathogenicity of synonymous variants in the human genome

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Linnaeus Bundalian - , Institute of Clinical Genetics, National Center for Tumor Diseases Dresden, University Hospital Leipzig (Author)
  • Martina Schmidt Strnadová - , Leipzig University (Author)
  • Felix Garten - , University Hospital Leipzig (Author)
  • Susanne Horn - , Leipzig University (Author)
  • Udo Stenzel - , Leipzig University (Author)
  • Denny Popp - , University Hospital Leipzig (Author)
  • Johannes R Lemke - , University Hospital Leipzig (Author)
  • Saskia Biskup - , CeGaT GmbH (Author)
  • Björn Schulte - , CeGaT GmbH (Author)
  • Patrick May - , University of Luxembourg (Author)
  • Frank Bösebeck - , Agaplesion Diakonieklinikum Rotenburg (Author)
  • Antje Garten - , University Hospital Leipzig (Author)
  • Doreen Thor - , Leipzig University (Author)
  • Angela Schulz - , Leipzig University (Author)
  • Julia Hentschel - , University Hospital Leipzig (Author)
  • Janet Kelso - , Max Planck Institute for Evolutionary Anthropology (Author)
  • Torsten Schöneberg - , University of Global Health Equity (Author)
  • Diana Le Duc - , Institute of Clinical Genetics, National Center for Tumor Diseases Dresden, University Hospital Leipzig, Max Planck Institute for Evolutionary Anthropology, Klinikum Chemnitz gGmbH (Author)

Abstract

Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from benign variants, with a ROC-AUC of 0.89, outperforming individual predictors. Our estimates indicate that about 1.98 ± 0.17% of sSNVs absent from population databases are damaging (roughly 900 000 sSNVs), with an odds ratio of 3.87 for deleteriousness compared to common sSNVs (P < 0.05). To validate predictions, we performed functional assays on selected sSNVs in the AVPR2 gene and additionally used available large scale mutagenesis screens of RAD51C and BAP1 variants. In a clinical cohort, we identified 15 predicted deleterious sSNVs in genes linked to patient phenotypes; 9 were classified as (likely) pathogenic while 6 were variants of uncertain significance (VUS) per American College of Medical Genetics guidelines. For three VUS, segregation data supported their suspected inheritance patterns (de novo, X-linked). Our findings underscore the functional importance of sSNVs. To support further research and clinical applications, we provide a Python package and web application (https://symetrics.org/) for evaluating these variants comprehensively.

Details

Original languageEnglish
Article numberlqaf211
Journal NAR: genomics and bioinformatics
Volume8
Issue number1
Publication statusPublished - Mar 2026
Peer-reviewedYes

External IDs

PubMedCentral PMC12805901
Scopus 105027957751

Keywords

Keywords

  • Genome, Human, Humans, Machine Learning, Polymorphism, Single Nucleotide, Silent Mutation, Software