Interoperable, Domain-Specific Extensions for the German Corona Consensus (GECCO) COVID-19 Research Data Set Using an Interdisciplinary, Consensus-Based Workflow: Data Set Development Study

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Gregor Lichtner - , Charité – Universitätsmedizin Berlin (Author)
  • Thomas Haese - , Charité – Universitätsmedizin Berlin (Author)
  • Sally Brose - , Charité – Universitätsmedizin Berlin (Author)
  • Larissa Röhrig - , Charité – Universitätsmedizin Berlin (Author)
  • Liudmila Lysyakova - , Charité – Universitätsmedizin Berlin (Author)
  • Stefanie Rudolph - , Charité – Universitätsmedizin Berlin (Author)
  • Maria Uebe - , Charité – Universitätsmedizin Berlin (Author)
  • Julian Sass - , Charité – Universitätsmedizin Berlin (Author)
  • Alexander Bartschke - , Charité – Universitätsmedizin Berlin (Author)
  • David Hillus - , Charité – Universitätsmedizin Berlin (Author)
  • Florian Kurth - , Charité – Universitätsmedizin Berlin (Author)
  • Leif Erik Sander - , Charité – Universitätsmedizin Berlin (Author)
  • Falk Eckart - , Department of Paediatrics (Author)
  • Nicole Toepfner - , Department of Paediatrics (Author)
  • Reinhard Berner - , Department of Paediatrics (Author)
  • Anna Frey - , University Hospital of Würzburg (Author)
  • Marcus Dörr - , Erasmus University Medical Center (Author)
  • Jörg Janne Vehreschild - , German Center for Infection Research, Partner Site Bonn-Cologne (Author)
  • Christof von Kalle - , Charité – Universitätsmedizin Berlin (Author)
  • Sylvia Thun - , Charité – Universitätsmedizin Berlin (Author)

Abstract

Background: The COVID-19 pandemic has spurred large-scale, interinstitutional research efforts. To enable these efforts, researchers must agree on data set definitions that not only cover all elements relevant to the respective medical specialty but also are syntactically and semantically interoperable. Therefore, the German Corona Consensus (GECCO) data set was developed as a harmonized, interoperable collection of the most relevant data elements for COVID-19-related patient research. As the GECCO data set is a compact core data set comprising data across all medical fields, the focused research within particular medical domains demands the definition of extension modules that include data elements that are the most relevant to the research performed in those individual medical specialties. Objective: We aimed to (1) specify a workflow for the development of interoperable data set definitions that involves close collaboration between medical experts and information scientists and (2) apply the workflow to develop data set definitions that include data elements that are the most relevant to COVID-19-related patient research regarding immunization, pediatrics, and cardiology. Methods: We developed a workflow to create data set definitions that were (1) content-wise as relevant as possible to a specific field of study and (2) universally usable across computer systems, institutions, and countries (ie, interoperable). We then gathered medical experts from 3 specialties-infectious diseases (with a focus on immunization), pediatrics, and cardiology-to select data elements that were the most relevant to COVID-19-related patient research in the respective specialty. We mapped the data elements to international standardized vocabularies and created data exchange specifications, using Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR). All steps were performed in close interdisciplinary collaboration with medical domain experts and medical information specialists. Profiles and vocabulary mappings were syntactically and semantically validated in a 2-stage process. Results: We created GECCO extension modules for the immunization, pediatrics, and cardiology domains according to pandemic-related requests. The data elements included in each module were selected, according to the developed consensus-based workflow, by medical experts from these specialties to ensure that the contents aligned with their research needs. We defined data set specifications for 48 immunization, 150 pediatrics, and 52 cardiology data elements that complement the GECCO core data set. We created and published implementation guides, example implementations, and data set annotations for each extension module. Conclusions: The GECCO extension modules, which contain data elements that are the most relevant to COVID-19-related patient research on infectious diseases (with a focus on immunization), pediatrics, and cardiology, were defined in an interdisciplinary, iterative, consensus-based workflow that may serve as a blueprint for developing further data set definitions. The GECCO extension modules provide standardized and harmonized definitions of specialty-related data sets that can help enable interinstitutional and cross-country COVID-19 research in these specialties.

Details

Original languageEnglish
Pages (from-to)e45496
JournalJMIR medical informatics
Volume11
Publication statusPublished - 18 Jul 2023
Peer-reviewedYes

External IDs

PubMedCentral PMC10368099
Scopus 85165950642

Keywords

Sustainable Development Goals