Improved Acoustic Modeling for Automatic Piano Music Transcription Using Echo State Networks
Research output: Contribution to book/Conference proceedings/Anthology/Report › Chapter in book/Anthology/Report › Contributed › peer-review
Contributors
Abstract
Automatic music transcription (AMT) is one of the challenging problems in Music Information Retrieval with the goal of generating a score-like representation of a polyphonic audio signal. Typically, the starting point of AMT is an acoustic model that computes note likelihoods from feature vectors. In this work, we evaluate the capabilities of Echo State Networks (ESNs) in acoustic modeling of piano music. Our experiments show that the ESN-based models outperform state-of-the-art Convolutional Neural Networks (CNNs) by an absolute improvement of 0.5 F1 -score without using an extra language model. We also discuss that a two-layer ESN, which mimics a hybrid acoustic and language model, achieves better results than the best reference approach that combines Invertible Neural Networks (INNs) with a biGRU language model by an absolute improvement of 0.91 F1 -score.
Details
Original language | English |
---|---|
Title of host publication | Advances in Computational Intelligence |
Publisher | Springer Verlag |
Number of pages | 12 |
Publication status | Published - 21 Aug 2021 |
Peer-reviewed | Yes |
External IDs
Scopus | 85115199523 |
---|---|
ORCID | /0000-0003-0167-8123/work/167214850 |
Keywords
ASJC Scopus subject areas
Keywords
- Acoustic modeling, Automatic piano transcription, Echo state network