Similarity Analysis of Visual Sketch-based Search for Sounds

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Abstract

Searching through a large audio database for a specific sound can be a slow and tedious task with detrimental effects on creative workflow. Listening to each sample is time consuming, while textual descriptions or tags may be insufficient, unavailable or simply unable to meaningfully capturing certain sonic qualities. This paper explores the use of visual sketches that express the mental model associated with a sound to accelerate the search process. To achieve this, a study was conducted to collect data on how 30 people visually represent sound, by providing hand-sketched visual representations for a range of 30 different sounds. After augmenting the data to a sparse set of 855 samples, two different autoencoder were trained. The one finds similar sketches in latent space and delivers the associated audio files. The other one is a multimodal autoencoder combining both visual and sonic cues in a common feature space but lacks on having no audio input for the search task. These both were then used to implement and discuss a visual query-by-sketch search interface for sounds.

Details

Original languageEnglish
Title of host publicationAudio Mostly 2021
PublisherAssociation for Computing Machinery (ACM), New York
Pages101-108
ISBN (electronic)9781450385695
Publication statusPublished - Sept 2021
Peer-reviewedYes

External IDs

Scopus 85117960400
ORCID /0000-0002-8923-6284/work/142247080

Keywords