Automatic Information Extraction from Scientific Publications Based on the Use Case of Additive Manufacturing

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

Abstract

A systematic literature review is fundamental to building a robust research foundation, informing experimental methodology, and ensuring the quality of future scientific output. However, manual extraction of targeted information from scientific publications is often laborious and prone to error, especially when researchers require rapid access to relevant findings without specialized hardware. This paper introduces an automated workflow for information extraction from scientific publications in the engineering domain. The proposed workflow consists of two primary stages: data preparation and information extraction. During data preparation, PDF files are converted to plain text and segmented into logical sections using a rule-based block detection and classification algorithm for keeping semantics. Information extraction is then performed by applying regular expressions both on keys and values in the same sentence to identify and extract relevant process and material data from the segmented text. The approach was evaluated on a dataset of 18 open-access scientific publications from various journals and conference proceedings in the AM domain. The results of the automated extraction were compared with manual extraction and with a modern large language model (LLM)-based approach. The findings demonstrate that the proposed workflow can accurately and efficiently extract relevant process and material data, achieving competitive performance relative to the LLM-based method. The workflow offers a significant reduction in time and potential errors associated with manual extraction, with automated processing averaging 15 s per document compared to one hour for manual extraction, and achieving a 76% match rate. This efficiency enables researchers to rapidly and effectively extract data. The methodology is readily transferable to other scientific fields where systematic literature reviews and structured data extraction are required.

Details

OriginalspracheEnglisch
Aufsatznummer9331
Seitenumfang41
FachzeitschriftApplied Sciences : open access journal
Jahrgang15
Ausgabenummer17
Frühes Online-Datum25 Aug. 2025
PublikationsstatusVeröffentlicht - 1 Sept. 2025
Peer-Review-StatusJa

Externe IDs

ORCID /0000-0001-7540-4235/work/190571589
Mendeley a61e7d95-1508-3abe-b63b-59847ca068bf
ORCID /0009-0009-9342-629X/work/193863857

Schlagworte

Forschungsprofillinien der TU Dresden

Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis

Schlagwörter

  • automatic extraction, literature research, scientific publications, information extraction, text mining, PDF format, additive manufacturin, additive manufacturing