Automatic classification of building footprints – A contribution to the small-scale description of the settlement structure

Research output: Types of ThesisDoctoral thesis

Contributors

  • Robert Hecht - (Author)

Abstract

Small-scale information on functional, morphological and socio-economic structure is necessary to provide answers for research and planning issues in urban and rural areas. Buildings are of key importance here, since they determine the physical structure of a settlement. Moreover, their types of occupancy creates the distribution pattern of housing, workplaces and infrastructure. Despite their significance both for researchers and planners, such data is often not up to date, strongly spatially aggregated or only very locally available.rnBuilding footprints are registered and maintained in the real estate cadastre and in topographic-cartographic information systems by the ordnance survey. They are available to the public in the form of geospatial data, maps and online services. However, semantic information on building function, type of housing, building age or numbers of floors is sparse. The Working Committee of the Surveying Authorities of the States of the Federal Republic of Germany (AdV) specifies only a mandatory differentiation between “residential”, “public” and “industrial/commercial” buildings in the cadastre. Some official building data, like the nationwide product “Amtliche Hausumringe” (official building footprints) as well as buildings extracted from maps, laser scanner data, aerial photographs or satellite imagery, contain initially no attributes at all. Therefore users are often faced with a pure building geometry, on the basis of which only little knowledge about the settlement structure can be obtained. However, appropriately classified building footprints would allow them to obtain additional indicators of the settlement structure, such as building density, floor area, and number of housings and residents, all of which can be derived and visualized by means of GIS technology.rnIn this thesis, methods for the automatic classification of building footprints are developed, analyzed and assessed, with the aim to use them for a small-scale description of the settlement structure. The procedure presented follows a data-driven pattern recognition approach using training samples with known classes and features of buildings. The work addresses issues of data integration, data processing, feature extraction, and feature selection and investigates the accuracy of various classification methods. rnCurrently, as is shown, there are only a few scientific studies that pursue the use of pattern recognition and machine learning for building classification. Many approaches rely on knowledge-based models which are not very flexible when the data input or the desired target classes change. Moreover, for many approaches, there is no critical accuracy assessment by means of independent test data. Therefore, when developing a procedure for automatic building classification, particular attention has been paid to flexibility, automation and reliable validation. The developed approach makes use of basic topographic objects only: building footprints, official building coordinates and urban blocks. Thus, a nationwide applicability of the method is ensured for Germany and countries with comparable data.rnOut of the stock of spatial base data on buildings available in Germany, five different input data types have been identified, which differ in structure (raster or vector), geometric modelling (individual buildings or building-regions), and semantic information content (with or without information on building use). For each input data type, an extensive set of features has been developed, with which all objects and relations are described at various spatial levels (e.g., single building, building complex, urban block, or a defined neighbourhood). The features are calculated using methods of digital image processing and spatial analysis within a GIS environment. Highly correlated features are removed from these sets by using a filter-based feature selection. Since it can be assumed that the building typology will be known in the context of the settlement structure analyses, a supervised learning strategy has been preferred for training the classifier. At the same time, supervised machine learning procedures, unlike unsupervised ones, allow an immediate assessment of the prediction error, with no necessity for any “sophisticated” data interpretation.rnA reference database with over 800,000 building footprints has been created for the accuracy assessment, in which the building type for each building is provided, which corresponds to a defined building typology. The typology distinguishes between eleven classes according to urban planning criteria. Various types of settlement (city, town or village) and several administrative data bases (DTK25-V, DTK25, ATKIS®, ALK, 3D building models) are considered, which enables a differentiated evaluation of accuracy. rnIn a model selection process, 16 different supervised classification methods are tested on selected data sets, and their generalization capabilities have been evaluated based on a ten-fold cross-validation. The following models are used: linear models, non-parametric models, support vector machines, artificial neural networks, decision trees and ensemble methods. Non-linear models, like the ensemble-based random-forest algorithm, show the highest degree of generalization capability and efficiency. Random forest has been chosen as the best classifier, since it also has a number of other practical advantages over other methods, as it does not absolutely require either data scaling nor feature selection. Moreover, categorical features can be processed directly and the algorithm provides measurements to quantify the importance of the features.rnAfter the selection of a classification procedure, a detailed accuracy assessment based on all data sets in the reference data base is performed. Based on confusion matrices and quality measures derived from them, the classifier is evaluated and assessed separately according to input data type and study area. For vector-based building footprints, especially buildings from ALK/ATKIS®, ATKIS® or official building footprints and 3D building models, an overall accuracy between 90 % and 95 % could be achieved. The accuracy when using building footprints extracted from digital topographic raster maps was less – only 76 % to 88 %. In a simulation, it was shown that the sizes of training data have a major impact on the classification accuracy. One challenge in particular can be found in regional differences in the cultural-historic architectural structure between cities. Attempts to train and test a random forest classifier between cities of different architectural characteristics have shown the limitations of the transferability. A nationwide application of the method will therefore necessarily require regional delimitation and the collection of separate training data in each of the regions.rnThe automatic classification of building footprints provides an important contribution to the acquisition of new information for the small-scale description of settlement structures. In addition to its relevance for research and application areas of urban geography and urban planning, the results are also relevant for cartographic disciplines such as map generalization, automated mapping and geo-visualization.

Details

Original languageEnglish
Qualification levelDr.-Ing.
Awarding Institution
Supervisors/Advisors
  • Buchroithner, Manfred, Mentor
Defense Date (Date of certificate)10 Jun 2013
Publisher
  • Rhombos-Verlag, Berlin
Print ISBNs978-3-944101-63-7
Publication statusPublished - 2013
No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.Thesis

Keywords

Sustainable Development Goals

Keywords

  • official building footprints, automatic classification