ImitAL: Learning Active Learning Strategies from Synthetic Data

Julius Gonsior; Maik Thiele; Wolfgang Lehner

doi:10.48550/arXiv.2108.07670

ImitAL: Learning Active Learning Strategies from Synthetic Data

Publikation: Vorabdruck/Dokumentation/Bericht › Vorabdruck (Preprint)

Beitragende

Julius Gonsior - , Professur für Datenbanken (Autor:in)
Maik Thiele - , Professur für Datenbanken (Autor:in)
Wolfgang Lehner - , Professur für Datenbanken (Autor:in)

Abstract

One of the biggest challenges that complicates applied supervised machine learning is the need for huge amounts of labeled data. Active Learning (AL) is a well-known standard method for efficiently obtaining labeled data by first labeling the samples that contain the most information based on a query strategy. Although many methods for query strategies have been proposed in the past, no clear superior method that works well in general for all domains has been found yet. Additionally, many strategies are computationally expensive which further hinders the widespread use of AL for large-scale annotation projects. We, therefore, propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem. For training the underlying neural network we chose Imitation Learning. The required demonstrative expert experience for training is generated from purely synthetic data. To show the general and superior applicability of \ImitAL{}, we perform an extensive evaluation comparing our strategy on 15 different datasets, from a wide range of domains, with 10 different state-of-the-art query strategies. We also show that our approach is more runtime performant than most other strategies, especially on very large datasets.

Details

Originalsprache	Englisch
Seitenumfang	11
Publikationsstatus	Veröffentlicht - 17 Aug. 2021

No renderer: customAssociatesEventsRenderPortal,dk.atira.pure.api.shared.model.researchoutput.WorkingPaper

Externe IDs

ArXiv	http://arxiv.org/abs/2108.07670v1
ORCID	/0000-0001-8107-2775/work/142660530
ORCID	/0000-0002-5985-4348/work/162348856

Schlagworte

Schlagwörter

cs.LG, cs.AI

Forschungsportal der TU Dresden