Improved Acoustic Modeling for Automatic Piano Music Transcription Using Echo State Networks

Peter Steiner; Azarakhsh Jalalvand; Peter Birkholz

doi:10.1007/978-3-030-85099-9_12

Improved Acoustic Modeling for Automatic Piano Music Transcription Using Echo State Networks

Research output: Contribution to book/Conference proceedings/Anthology/Report › Chapter in book/Anthology/Report › Contributed › peer-review

Contributors

Peter Steiner - , Chair of Speech Technology and Cognitive Systems (Author)
Azarakhsh Jalalvand - (Author)
Peter Birkholz - , Chair of Speech Technology and Cognitive Systems (Author)

Abstract

Automatic music transcription (AMT) is one of the challenging problems in Music Information Retrieval with the goal of generating a score-like representation of a polyphonic audio signal. Typically, the starting point of AMT is an acoustic model that computes note likelihoods from feature vectors. In this work, we evaluate the capabilities of Echo State Networks (ESNs) in acoustic modeling of piano music. Our experiments show that the ESN-based models outperform state-of-the-art Convolutional Neural Networks (CNNs) by an absolute improvement of 0.5 F₁ -score without using an extra language model. We also discuss that a two-layer ESN, which mimics a hybrid acoustic and language model, achieves better results than the best reference approach that combines Invertible Neural Networks (INNs) with a biGRU language model by an absolute improvement of 0.91 F₁ -score.

Details

Original language	English
Title of host publication	Advances in Computational Intelligence
Publisher	Springer Verlag
Number of pages	12
Publication status	Published - 21 Aug 2021
Peer-reviewed	Yes

External IDs

Scopus	85115199523
ORCID	/0000-0003-0167-8123/work/167214850

Keywords

ASJC Scopus subject areas

Keywords

Acoustic modeling, Automatic piano transcription, Echo state network

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords