The dataset used in this project was obtained from Kaggle (https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks), and contains the audio features of about 175,000 songs in the Spotify database released between 1921 and 2021. The first visualization (scatterplot) represents a random sampling of 1000 tracks from the dataset, while the other visualizations represent the entire dataset.
As an avid music listener, this project presented an opportunity to dive deeper into understanding one of my interests from the perspective of a data scientist. Are there any common relationships or correlations between the various attributes of tracks? Are there any significant trends in the types of published music? How do the qualities of my favorite songs compare to the "average" song? All of these questions are explored through the visualizations created.
In the first visualization, I was surprised to see correlations weaker than what
was expected across several relationships. For example, a strong positive correlation was expected between
danceability and tempo (faster songs are better to dance to). Instead, a very weak correlation existed.
Across attributes, correlations tend to fall between -0.5 and +0.5, representing weak to moderate relationships.
In the second visualization, there appears to be a significant shift in trends beginning in
the 1950s. Most significantly, acousticness begins to plummet and energy rises rapidly around this time.
Moreover, year-to-year averages appear much more chaotic prior to 1950. Another trend of note is that in the
1990s, the percentage of tracks that are explicit rises dramatically. This can likely be attributed to the rise
of hip-hop and rap music, which is frequently explicit.
With the third visualization being user-directed, insights will be unique to each person using it.
After searching some of the songs I frequently listen to, I found that my taste is higher in
danceability and energy than the median, and lower in acousticness.
Attribute | Description |
---|---|
Acousticness | A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. |
Danceability | How suitable a track is for dancing (from 0.0 to 1.0) based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. |
Energy | A measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. |
Explicit | Whether or not the track has explicit lyrics. 0 if false, 1 if true. |
Liveness | Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. |
Loudness | The overall loudness of a track in decibels (dB), averaged across the entire track. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). |
Popularity | A value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. |
Speechiness | Detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. |
Tempo | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece. |
Valence | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
Source: https://developer.spotify.com/documentation/web-api/reference/