Conceptual design of a generic data harmonization process for OMOP common data model

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung



BACKGROUND: To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM.

METHODS: For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps.

RESULTS: From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps.

CONCLUSIONS: The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.


Seiten (von - bis)58
FachzeitschriftBMC medical informatics and decision making
PublikationsstatusVeröffentlicht - 26 Feb. 2024

Externe IDs

PubMedCentral PMC10895818
Scopus 85185955930
ORCID /0000-0003-0154-2867/work/160047259
ORCID /0000-0002-5577-7760/work/160048645
ORCID /0000-0002-9888-8460/work/160050004
Mendeley 306f657f-e222-37b4-a8d9-66c47601345e



  • Humans, Databases, Factual, Vocabulary, Data Science, Semantics, Medical Informatics, Electronic Health Records