https://github.com/scarblase/spotify_analysis
Analyzing Spotify's top 1000 tracks using Python, DuckDB, and Spotify-themed visualizations to uncover trends and insights.
https://github.com/scarblase/spotify_analysis
dbt duckdb matplotlib pandas python3 sns sql
Last synced: about 2 months ago
JSON representation
Analyzing Spotify's top 1000 tracks using Python, DuckDB, and Spotify-themed visualizations to uncover trends and insights.
- Host: GitHub
- URL: https://github.com/scarblase/spotify_analysis
- Owner: scarblase
- Created: 2025-04-22T18:20:15.000Z (about 1 year ago)
- Default Branch: origin
- Last Pushed: 2025-04-22T18:37:24.000Z (about 1 year ago)
- Last Synced: 2025-04-22T19:46:39.775Z (about 1 year ago)
- Topics: dbt, duckdb, matplotlib, pandas, python3, sns, sql
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/datasets/kunalgp/top-1000-most-played-spotify-songs-of-all-time
- Size: 452 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spotify Analysis 🎵
Welcome to the **Spotify Analysis** repository! This project dives deep into the top 1000 most-played Spotify tracks of all time, exploring trends, patterns, and insights through data analysis and visualization.
## 🌟 Project Highlights
- Analyze and visualize Spotify's **Top 1000 Tracks** dataset.
- Use **DuckDB** for SQL-style queries on the dataset. (Yes, we tried DuckDB just to see how it works! 😉)
- Beautiful, Spotify-themed visualizations.
- Explore trends in artists, albums, track durations, and popularity.
---
## 📊 Dataset Overview
The dataset contains information on 1000 tracks, with the following attributes:
| Column | Description |
|----------------|--------------------------------------------------|
| **track_name** | Name of the track. |
| **artist** | Name of the artist. |
| **album** | Album associated with the track. |
| **release_date** | Track's release date. |
| **popularity** | Popularity score (0-100). |
| **spotify_url** | Link to the track on Spotify. |
| **id** | Unique Spotify track ID. |
| **duration_min** | Track duration in minutes. |
| **release_year** | Year of release (derived). |
| **release_month** | Month of release (derived). |
---
## 🧰 Tools and Technologies
This project leverages the following tools and libraries:
- **DuckDB**: SQL-style queries on DataFrames (because why not try it? 😄).
- **Pandas**: Data wrangling and exploratory analysis.
- **Matplotlib** & **Seaborn**: Stunning visualizations.
- **BeautifulSoup**: Web scraping (initial attempts to fetch data).
- **Python**: The powerhouse behind everything.
---
## 📈 Key Insights and Visualizations
### 1. **Top Artists and Albums**
- Bar charts showing the **Top 10 Artists** and **Top 5 Albums** based on the number of tracks.
- Styled with Spotify's signature **black, green, and white** theme for a clean and modern look.
### 2. **Popularity Trends**
- **Line Plots**:
- Popularity trends over the years.
- Monthly popularity patterns.
- **Scatter Plot**:
- Relationship between track duration and popularity.
### 3. **Distribution Analysis**
- **KDE Plots** to showcase the distribution of:
- Track popularity.
- Track durations.
### 4. **Popular Tracks by Artist**
- Extracted the **most popular track** for each artist with a popularity score ≥ 90 using **DuckDB**.
---
## 🎨 Spotify-Themed Styling
- **Color Palette**:
- Spotify Black: `#121212`
- Spotify Green: `#1DB954`
- Spotify Light Grey: `#B3B3B3`
- Spotify White: `#FFFFFF`
- Fonts and gridlines styled to match Spotify's sleek UI.
---
## 🚀 Getting Started
Follow these steps to run the project locally:
### 1. Clone the Repository
```bash
git clone https://github.com/scarblase/spotify_analysis.git
cd spotify_analysis
```
### 2. Install Dependencies
Make sure you have Python installed, then run:
```bash
pip install pandas duckdb matplotlib seaborn requests beautifulsoup4
```
### 3. Load the Dataset
Ensure the file `spotify_top_1000_tracks.csv` is in the repository's root directory.
### 4. Run the Jupyter Notebook
Launch Jupyter Notebook to explore the analysis:
```bash
jupyter notebook
```
---
## 🌐 References
- **Dataset Source**: [Kaggle - Top 1000 Most Played Spotify Songs of All Time](https://www.kaggle.com/datasets/kunalgp/top-1000-most-played-spotify-songs-of-all-time)
- **Spotify**: [Spotify](https://www.spotify.com)
---
✨ Dive into the dataset and enjoy uncovering the magic of Spotify's top tracks!