Clustering techniques and keyword extraction with large language models for knowledge discovery in building defects data

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Linda Cusumano - , Chalmers University of Technology (Autor:in)
  • Nilla Olsson - , Nordic Construction Company (Autor:in)
  • Mats Granath - , University of Gothenburg (Autor:in)
  • Robert Jockwer - , Chalmers University of Technology (Autor:in)
  • Rasmus Rempling - , Chalmers University of Technology, Nordic Construction Company (Autor:in)

Abstract

Purpose: The construction industry is undergoing a digital transformation and now holds large volumes of digital building defects data collected during inspections. This study aims to suggest an artificial intelligence-based method for analysing such building defects data to provide insights and knowledge faster than with traditional manual methods. Design/methodology/approach: This research explores a data set containing over 34,000 defects from hospital projects performed in Sweden from 2018 to 2021. The data mining uses keyword extraction based on both TF-IDF vectorisation and k-means clustering, the Mistral 7B model and KeyLLM. The results are compared with a content analysis using the GPT 3.5 turbo model. The analysis is performed both on an organisational and project level. Findings: The paper presents a combination of methods for analysing building defects data. The result shows that the most common problems reported during the inspections concern missing fire sealing, jointing and subceiling problems. Using k-means clustering gives fast insights into the main defect categories of the data set but requires domain knowledge. Keyword extraction using an LLM requires longer computational time but creates a deeper understanding of subcategories of defects. Finally, GPT-based content analysis is a complement to provide project-specific insights and allow user-specific requests. Research limitations/implications: The study is performed using data digitally collected in Swedish hospital projects. However, the results and methodology can be applied on other project data, such as safety inspections and warranty data. The analysis focused solely on text data. Originality/value: The method suggested in this paper uses clustering techniques and Large Language Models for analysing building defect data. The value of the proposed method is a faster process for leveraging knowledge from large amounts of unstructured text data, such as building defect reports, safety and moisture inspections and warranty issues.

Details

OriginalspracheEnglisch
Seiten (von - bis)76-97
Seitenumfang22
FachzeitschriftConstruction Innovation
Jahrgang25
Ausgabenummer7
PublikationsstatusVeröffentlicht - 2025
Peer-Review-StatusJa
Extern publiziertJa

Externe IDs

ORCID /0000-0003-0767-684X/work/188439641

Schlagworte

Schlagwörter

  • Defects, Inspections, Knowledge generation, LLM