Similarity Analysis of Visual Sketch-based Search for Sounds

Lars Engeln; Nhat Long Le; Matthew McGinity; Rainer Groh

doi:10.1145/3478384.3478423

Similarity Analysis of Visual Sketch-based Search for Sounds

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Lars Engeln - , Institut für Software- und Multimediatechnik (SMT), Professur für Mediengestaltung (Autor:in)
Nhat Long Le - (Autor:in)
Matthew McGinity - , Juniorprofessur für Gestaltung immersiver Medien (TT) (Autor:in)
Rainer Groh - , Professur für Mediengestaltung (Autor:in)

Abstract

Searching through a large audio database for a specific sound can be a slow and tedious task with detrimental effects on creative workflow. Listening to each sample is time consuming, while textual descriptions or tags may be insufficient, unavailable or simply unable to meaningfully capturing certain sonic qualities. This paper explores the use of visual sketches that express the mental model associated with a sound to accelerate the search process. To achieve this, a study was conducted to collect data on how 30 people visually represent sound, by providing hand-sketched visual representations for a range of 30 different sounds. After augmenting the data to a sparse set of 855 samples, two different autoencoder were trained. The one finds similar sketches in latent space and delivers the associated audio files. The other one is a multimodal autoencoder combining both visual and sonic cues in a common feature space but lacks on having no audio input for the search task. These both were then used to implement and discuss a visual query-by-sketch search interface for sounds.

Details

Originalsprache	Englisch
Titel	Audio Mostly 2021
Herausgeber (Verlag)	Association for Computing Machinery (ACM), New York
Seiten	101-108
Seitenumfang	8
ISBN (elektronisch)	9781450385695
Publikationsstatus	Veröffentlicht - Sept. 2021
Peer-Review-Status	Ja

Externe IDs

Scopus	85117960400
ORCID	/0000-0002-8923-6284/work/142247080
ORCID	/0000-0002-9268-4854/work/173987952

Forschungsportal der TU Dresden

Similarity Analysis of Visual Sketch-based Search for Sounds

Beitragende

Abstract

Details

Externe IDs

Schlagworte

Verknüpfte Inhalte

Die Verbildlichung von Klangstrukturen im Kontext der Entwicklung von Werkzeugen für die Medienproduktion