Similarity Analysis of Visual Sketch-based Search for Sounds
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Searching through a large audio database for a specific sound can be a slow and tedious task with detrimental effects on creative workflow. Listening to each sample is time consuming, while textual descriptions or tags may be insufficient, unavailable or simply unable to meaningfully capturing certain sonic qualities. This paper explores the use of visual sketches that express the mental model associated with a sound to accelerate the search process. To achieve this, a study was conducted to collect data on how 30 people visually represent sound, by providing hand-sketched visual representations for a range of 30 different sounds. After augmenting the data to a sparse set of 855 samples, two different autoencoder were trained. The one finds similar sketches in latent space and delivers the associated audio files. The other one is a multimodal autoencoder combining both visual and sonic cues in a common feature space but lacks on having no audio input for the search task. These both were then used to implement and discuss a visual query-by-sketch search interface for sounds.
Details
Original language | English |
---|---|
Title of host publication | Audio Mostly 2021 |
Publisher | Association for Computing Machinery (ACM), New York |
Pages | 101-108 |
Number of pages | 8 |
ISBN (electronic) | 9781450385695 |
Publication status | Published - Sept 2021 |
Peer-reviewed | Yes |
External IDs
Scopus | 85117960400 |
---|---|
ORCID | /0000-0002-8923-6284/work/142247080 |