WeakAL: Combining Active Learning and Weak Supervision
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Supervised Learning requires a huge amount of labeled data, making efficient labeling one of the most critical components for the success of Machine Learning (ML). One well-known method to gain labeled data efficiently is Active Learning (AL), where the learner interactively asks human experts to label the most informative data point. Nevertheless, even by applying AL in labeling tasks the amount of human effort is still too high and should be minimized further. In this paper therefore we propose WeakAL, which incorporates Weak Supervision (WS) techniques directly into the AL cycle. This allows us to reduce the number of annotations by human experts while keeping the same level of ML performance. We investigate different WS strategies as well as different parameter combinations for a wide range of real-world datasets. Our evaluation shows that for example in the context of Web table classification, 55% of otherwise manually retrieved labels can be generated by WS techniques with a negligible loss of test accuracy by 0.31% only. To further prove the general applicability of our approach we applied it to six datasets from the AL challenge from Guyon et al., where over 90% of the labels could be computed by the WS techniques, while still achieving competitive competition results.
Details
Originalsprache | Englisch |
---|---|
Titel | Discovery Science |
Redakteure/-innen | Annalisa Appice, Grigorios Tsoumakas, Yannis Manolopoulos, Stan Matwin |
Herausgeber (Verlag) | Springer, Berlin [u. a.] |
Seiten | 34-49 |
Seitenumfang | 16 |
ISBN (Print) | 9783030615260 |
Publikationsstatus | Veröffentlicht - 2020 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | Lecture Notes in Computer Science, Volume 12323 |
---|---|
ISSN | 0302-9743 |
Konferenz
Titel | 23rd International Conference on Discovery Science, DS 2020 |
---|---|
Dauer | 19 - 21 Oktober 2020 |
Stadt | Thessaloniki |
Land | Griechenland |
Externe IDs
Scopus | 85094100383 |
---|---|
ORCID | /0000-0002-5985-4348/work/162348852 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Active Learning, Classification, Information extraction, Machine Learning, Semi-supervised, Weak Supervision