Similarity Analysis of Visual Sketch-based Search for Sounds

Lars Engeln; Nhat Long Le; Matthew McGinity; Rainer Groh

doi:10.1145/3478384.3478423

Similarity Analysis of Visual Sketch-based Search for Sounds

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Lars Engeln - , Institute of Software and Multimedia Technology, Chair of Media Design (Author)
Nhat Long Le - (Author)
Matthew McGinity - , Junior Professorship in Immersive Media Design (TT) (Author)
Rainer Groh - , Chair of Media Design (Author)

Abstract

Searching through a large audio database for a specific sound can be a slow and tedious task with detrimental effects on creative workflow. Listening to each sample is time consuming, while textual descriptions or tags may be insufficient, unavailable or simply unable to meaningfully capturing certain sonic qualities. This paper explores the use of visual sketches that express the mental model associated with a sound to accelerate the search process. To achieve this, a study was conducted to collect data on how 30 people visually represent sound, by providing hand-sketched visual representations for a range of 30 different sounds. After augmenting the data to a sparse set of 855 samples, two different autoencoder were trained. The one finds similar sketches in latent space and delivers the associated audio files. The other one is a multimodal autoencoder combining both visual and sonic cues in a common feature space but lacks on having no audio input for the search task. These both were then used to implement and discuss a visual query-by-sketch search interface for sounds.

Details

Original language	English
Title of host publication	Audio Mostly 2021
Publisher	Association for Computing Machinery (ACM), New York
Pages	101-108
Number of pages	8
ISBN (electronic)	9781450385695
Publication status	Published - Sept 2021
Peer-reviewed	Yes

External IDs

Scopus	85117960400
ORCID	/0000-0002-8923-6284/work/142247080
ORCID	/0000-0002-9268-4854/work/173987952

Research Portal of the TU Dresden

Similarity Analysis of Visual Sketch-based Search for Sounds

Contributors

Abstract

Details

External IDs

Keywords

Related content

Die Verbildlichung von Klangstrukturen im Kontext der Entwicklung von Werkzeugen für die Medienproduktion