Automatic Information Extraction from Scientific Publications Based on the Use Case of Additive Manufacturing
Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung
Beitragende
Abstract
A systematic literature review is fundamental to building a robust research foundation, informing experimental methodology, and ensuring the quality of future scientific output. However, manual extraction of targeted information from scientific publications is often laborious and prone to error, especially when researchers require rapid access to relevant findings without specialized hardware. This paper introduces an automated workflow for information extraction from scientific publications in the engineering domain. The proposed workflow consists of two primary stages: data preparation and information extraction. During data preparation, PDF files are converted to plain text and segmented into logical sections using a rule-based block detection and classification algorithm for keeping semantics. Information extraction is then performed by applying regular expressions both on keys and values in the same sentence to identify and extract relevant process and material data from the segmented text. The approach was evaluated on a dataset of 18 open-access scientific publications from various journals and conference proceedings in the AM domain. The results of the automated extraction were compared with manual extraction and with a modern large language model (LLM)-based approach. The findings demonstrate that the proposed workflow can accurately and efficiently extract relevant process and material data, achieving competitive performance relative to the LLM-based method. The workflow offers a significant reduction in time and potential errors associated with manual extraction, with automated processing averaging 15 s per document compared to one hour for manual extraction, and achieving a 76% match rate. This efficiency enables researchers to rapidly and effectively extract data. The methodology is readily transferable to other scientific fields where systematic literature reviews and structured data extraction are required.
Details
| Originalsprache | Englisch |
|---|---|
| Aufsatznummer | 9331 |
| Seitenumfang | 41 |
| Fachzeitschrift | Applied Sciences : open access journal |
| Jahrgang | 15 |
| Ausgabenummer | 17 |
| Frühes Online-Datum | 25 Aug. 2025 |
| Publikationsstatus | Veröffentlicht - 1 Sept. 2025 |
| Peer-Review-Status | Ja |
Externe IDs
| ORCID | /0000-0001-7540-4235/work/190571589 |
|---|---|
| Mendeley | a61e7d95-1508-3abe-b63b-59847ca068bf |
| ORCID | /0009-0009-9342-629X/work/193863857 |
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis
Ziele für nachhaltige Entwicklung
ASJC Scopus Sachgebiete
Schlagwörter
- automatic extraction, literature research, scientific publications, information extraction, text mining, PDF format, additive manufacturin, additive manufacturing