Similarity Analysis of Visual Sketch-based Search for Sounds

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung


Searching through a large audio database for a specific sound can be a slow and tedious task with detrimental effects on creative workflow. Listening to each sample is time consuming, while textual descriptions or tags may be insufficient, unavailable or simply unable to meaningfully capturing certain sonic qualities. This paper explores the use of visual sketches that express the mental model associated with a sound to accelerate the search process. To achieve this, a study was conducted to collect data on how 30 people visually represent sound, by providing hand-sketched visual representations for a range of 30 different sounds. After augmenting the data to a sparse set of 855 samples, two different autoencoder were trained. The one finds similar sketches in latent space and delivers the associated audio files. The other one is a multimodal autoencoder combining both visual and sonic cues in a common feature space but lacks on having no audio input for the search task. These both were then used to implement and discuss a visual query-by-sketch search interface for sounds.


TitelAudio Mostly 2021
Herausgeber (Verlag)Association for Computing Machinery (ACM), New York
ISBN (elektronisch)9781450385695
PublikationsstatusVeröffentlicht - Sept. 2021

Externe IDs

Scopus 85117960400
ORCID /0000-0002-8923-6284/work/142247080