Automatic Feature Engineering Through Monte Carlo Tree Search

Yiran Huang; Yexu Zhou; Michael Hefenbrock; Till Riedel; Likun Fang; Michael Beigl

doi:10.1007/978-3-031-26409-2_35

Automatic Feature Engineering Through Monte Carlo Tree Search

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Yiran Huang - , Karlsruhe Institute of Technology (Author)
Yexu Zhou - , Karlsruhe Institute of Technology (Author)
Michael Hefenbrock - , Karlsruhe Institute of Technology (Author)
Till Riedel - , Karlsruhe Institute of Technology (Author)
Likun Fang - , Karlsruhe Institute of Technology (Author)
Michael Beigl - , Karlsruhe Institute of Technology (Author)

Abstract

The performance of machine learning models depends heavily on the feature space and feature engineering. Although neural networks have made significant progress in learning latent feature spaces from data, compositional feature engineering through nested feature transformations can reduce model complexity and can be particularly desirable for interpretability. To find suitable transformations automatically, state-of-the-art methods model the feature transformation space by graph structures and use heuristics such as ϵ -greedy to search for them. Such search strategies tend to become less efficient over time because they do not consider the sequential information of the candidate sequences and cannot dynamically adjust the heuristic strategy. To address these shortcomings, we propose a reinforcement learning-based automatic feature engineering method, which we call Monte Carlo tree search Automatic Feature Engineering (mCAFE). We employ a surrogate model that can capture the sequential information contained in the transformation sequence and thus can dynamically adjust the exploration strategy. It balances exploration and exploitation by Thompson sampling and uses a Long Short Term Memory (LSTM) based surrogate model to estimate sequences of promising transformations. In our experiments, mCAFE outperformed state-of-the-art automatic feature engineering methods on most common benchmark datasets.

Details

Original language	English
Title of host publication	Machine Learning and Knowledge Discovery in Databases
Editors	Massih-Reza Amini, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, Grigorios Tsoumakas
Publisher	Springer, Cham
Pages	581–598
Number of pages	18
ISBN (electronic)	978-3-031-26409-2
ISBN (print)	978-3-031-26408-5
Publication status	Published - 2023
Peer-reviewed	Yes
Externally published	Yes

Publication series

Series	Lecture Notes in Computer Science, Volume 13715
ISSN	0302-9743

External IDs

Scopus	85151051582

Keywords

ASJC Scopus subject areas

Keywords

Data mining, Feature engineering, Monte Carlo tree search, Reinforce learning

Research Portal of the TU Dresden

Contributors

Abstract

Details

Publication series

External IDs

Keywords

ASJC Scopus subject areas

Keywords