Following the Line of Music

STAT 231 Blog Project

Authors

Gloria Wu

Justyce Williams

Ben Snyderman

Last updated

May 7, 2025

Abstract
Music is a significant portion of our media consumption - from iconic performances at the Super Bowl and the Grammys to the viral spread of songs on social media platforms. Through this project, we utilize \(k\)-means analysis, sentiment analysis, and a network map to explore music.

Introduction

In recent years, music has taken up a significant share of our media consumption - with the rise of digital media services and social media, consumers have little to no barriers to discovering, accessing, and engaging with music (Lee (2025)). This observation sparked curiosity in music analysis: we typically think of factors such as genre, lyricism, harmonic frequency, and artist connectivity as being integral to what makes music, music, but what is less clear is which factors really have an impact on the popularity and spread of music. Through this project, we aim to investigate music as a system of patterns, relationships, and compositions, as well as to uncover insights into how music is built, connected, and expressed.

To analyze these different components of music, we utilize three methods:
1. \(K\)-means analysis to see if there is a “formula,” in regards to track features, in determining track popularity.
2. Sentiment analysis to see any lyrics trends across songs, moods, and genres.
3. Network map to view relationships between artists and the industry

Dataset

Data on various songs was taken from the TidyTuesday Spotify dataset by Thompson et al. (2020). Included in the dataset was information about the track (such as the name, artist, and genre), and audio features (such as the danceability, tempo, and duration). Release dates of the songs in the dataset range from 1957 to 2020.

Then, song lyrics were scraped from Genius (2025).

Finally, general information on each track was scraped from Wikipedia (2025).

Limitations

Because songs were taken between 1957 to 2020, these results only apply to tracks from that time period. Due to the COVID-19 pandemic, online streaming platforms faced increased demand, leading to a higher quantity of music being produced every year (Sarmiento et al. (2025)). Thus, this sudden change in supply and demand may make the results of this study ungeneralizable beyond 2020.

Originally, we attempted to scrape data on current popular songs using the spotifyr package; however, because Spotify no longer allows for users to get data on tracks, we were unable to do so. Thus, with easier access to current data, we would be more able to analyze current trends and relationships in the music industry.

In the future, this project can be improved by using more recent data, as well as having access to scrape track data. Additionally, it would be interesting to see any trends between different streaming platforms, such as Apple Music and YouTube music. Because other countries use different streaming platforms (for example, people in China mainly use QQ music, as Spotify is blocked), trends across different countries and cultures could also be investigated.

References

Genius (2025), GENIUS.”
Lee, S. (2025), Decoding music consumption patterns and their media impact.”
Sarmiento, I. G., Cills, H., Powers, A., Madden, S., and Gotrich, L. (2025), How the pandemic changed music.”
Thompson, C., Parry, J., Phipps, D., and Wolff, T. (2020), “Spotify songs,” Available at https://github.com/rfordatascience/tidytuesday/blob/main/data/2020/2020-01-21/readme.md.
Wikipedia (2025), “Wiki — wikipedia, the free encyclopedia,” http://en.wikipedia.org/w/index.php?title=Wiki&oldid=1287546167.