NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Friederike Hanssen - , University of Tübingen (Author)
  • Gisela Gabernet - , University of Tübingen (Author)
  • Famke Bäuerle - , University of Tübingen, University Hospital Tübingen (Author)
  • Bianca Stöcker - , University of Duisburg-Essen (Author)
  • Felix Wiegand - , University of Duisburg-Essen (Author)
  • Nicholas H. Smith - , Technical University of Munich (Author)
  • Christian Mertes - , Technical University of Munich (Author)
  • Avirup Guha Neogi - , University of Cologne (Author)
  • Leon Brandhoff - , University of Cologne (Author)
  • Anna Ossowski - , University of Cologne (Author)
  • Janine Altmueller - , University of Cologne, Berlin Institute of Health at Charité, Max Delbrück Center for Molecular Medicine (MDC) (Author)
  • Kerstin Becker - , University of Cologne (Author)
  • Andreas Petzold - , DRESDEN-concept Genome Center (CMCB Core Facility), TUD Dresden University of Technology (Author)
  • Marc Sturm - , University Hospital Tübingen (Author)
  • Tyll Stöcker - , University of Bonn (Author)
  • Sugirthan Sivalingam - , University Hospital Duesseldorf (Author)
  • Fabian Brand - , University of Bonn (Author)
  • Axel Schmidt - , University of Bonn (Author)
  • Andreas Buness - , University of Bonn (Author)
  • Alexander J. Probst - , University of Duisburg-Essen (Author)
  • Susanne Motameny - , University of Cologne (Author)
  • Johannes Köster - , University of Duisburg-Essen, German Cancer Research Center (DKFZ) (Author)

Abstract

We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing.

Details

Original languageEnglish
Article number1125
JournalF1000Research
Volume12
Publication statusPublished - 2024
Peer-reviewedYes

External IDs

PubMed 39345270
ORCID /0000-0001-9599-8632/work/174428924