Keyness in song lyrics: Challenges of highly clumpy data

Jan Langenhorst; Yannick Frommherz; Simon Meier-Vieracker

doi:10.21248/jlcl.36.2023.236

Keyness in song lyrics: Challenges of highly clumpy data

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Jan Langenhorst - , Chair of Applied Linguistics (Author)
Yannick Frommherz - , Chair of Applied Linguistics (Author)
Simon Meier-Vieracker - , Chair of Applied Linguistics (Author)

Abstract

Computer-assisted stylistic analyses regularly employ the calculation of keywords. We show that the inclusion of a separate dispersion measure in addition to a frequency measure into keyword analysis (or more generally: keyness analysis), as proposed by Gries (2021), is a necessary extension of said analyses. Using texts from the German Songkorpus, we demonstrate that traditional keyword calculations using only frequency measures lead to spurious results. Determining keywords by both measuring a word’s frequency and its dispersion in comparison to a reference corpus gives a more realistic view. This is especially relevant for our corpus, since song lyrics turn out to be extraordinarily clumpy data: Words that are very frequent in one artist’s subcorpus typically only occur in a few or even just a single one of their songs due to widespread word repetition within songs, e.g., in choruses. Song lyrics in our dataset are shown to not feature words that can be considered key at all. Our contribution is twofold: (1) We demonstrate the utility of Gries’ (2021) approach and (2) interpret the (lack of) results in terms of a genre-specific property which is that song lyrics are lexically autonomous works of art.

Details

Original language	English
Pages (from-to)	21-38
Number of pages	18
Journal	Journal for Language Technology and Computational Linguistics
Volume	36
Issue number	1
Publication status	Published - 1 May 2023
Peer-reviewed	Yes

External IDs

ORCID	/0000-0002-0141-9327/work/142247660
ORCID	/0000-0002-3167-1670/work/142249137

Keywords

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Library keywords