https://github.com/pavankethavath/music_recommendation_engine
An advanced Music Recommendation System leveraging a Spotify dataset to deliver personalized song suggestions. The project applies KMeans clustering, PCA, t-SNE, and cosine similarity for precise recommendations. Built with a user-friendly Streamlit interface, it showcases data preprocessing, unsupervised learning, and insightful visualizations.
https://github.com/pavankethavath/music_recommendation_engine
clustering elbow-method feature-extraction kmeans-clustering pca recommendation-system spotify streamlit tsne
Last synced: about 1 month ago
JSON representation
An advanced Music Recommendation System leveraging a Spotify dataset to deliver personalized song suggestions. The project applies KMeans clustering, PCA, t-SNE, and cosine similarity for precise recommendations. Built with a user-friendly Streamlit interface, it showcases data preprocessing, unsupervised learning, and insightful visualizations.
- Host: GitHub
- URL: https://github.com/pavankethavath/music_recommendation_engine
- Owner: pavankethavath
- Created: 2024-10-31T18:50:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-27T16:39:20.000Z (over 1 year ago)
- Last Synced: 2025-03-29T22:25:10.976Z (about 1 year ago)
- Topics: clustering, elbow-method, feature-extraction, kmeans-clustering, pca, recommendation-system, spotify, streamlit, tsne
- Language: Jupyter Notebook
- Homepage:
- Size: 38.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π΅ **Music Recommendation System**
An advanced **Music Recommendation System** powered by data science and a comprehensive **Spotify dataset**. This project leverages real-world song data, unsupervised learning, and similarity metrics to deliver personalized music recommendations.

---
## π Overview
This project demonstrates the application of data science to solve real-world problems in the music industry. By preprocessing, clustering, and analyzing song data, the system provides recommendations for users based on the similarity of features like danceability, energy, and tempo.
---
## π **Project Highlights**
- **Real Data**: The dataset includes detailed song-level attributes sourced from Spotify, providing insights into audio characteristics.
- **Unsupervised Learning**: Implements **KMeans Clustering** for grouping similar songs.
- **Similarity-Based Recommendations**: Uses **cosine similarity** to recommend songs closest to user preferences.
- **Interactive Interface**: A clean and user-friendly **Streamlit** web application for seamless user interaction.
- **Data Visualization**: Insightful visualizations showcasing clustering, feature correlations, and distributions.
---
## π **How It Works**
1. **Data Collection**:
- The dataset contains song features like `danceability`, `energy`, `tempo`, and more.
- Metadata such as song names, artists, popularity, and release year are included for richer insights.
2. **Data Processing**:
- Standardized numerical features to ensure uniformity using **StandardScaler**.
- Dimensionality reduction using **PCA** for noise reduction and computational efficiency.
3. **Unsupervised Learning**:
- **KMeans Clustering** groups songs with similar attributes.
- Optimal clusters are determined using the **Elbow Method**.

4. **Recommendation Generation**:
- **Cosine Similarity** measures the closeness of songs within a cluster.
- Recommendations are ranked by similarity score.
5. **Interactive User Interface**:
- Built with **Streamlit** for a dynamic and easy-to-use app.
- Users input a song name to get recommendations based on dataset attributes.
---
## π Project Components
- **Data Preprocessing and Clustering**:
- File: `Music_Clustering_and_Recommendation.py`
- Steps include data cleaning, normalization, clustering, and model saving.
- **Recommendation Engine**:
- File: `recommendation_engine.py`
- Implements the recommendation logic with a user-friendly Streamlit interface.
- **Saved Artifacts**:
- `preprocessed_data.csv`: Preprocessed music data ready for recommendations.
- `kmeans_model.pkl`: Trained KMeans model for clustering.
## π How to Run the Project
### Prerequisites
Ensure the following are installed:
- Python 3.8 or higher
- Libraries: Streamlit, Pandas, Scikit-learn, Matplotlib, Seaborn, Plotly
### Steps
1. Clone the repository:
```bash
git clone https://github.com/your-username/music-recommendation-system.git
cd music-recommendation-system
2. Run the Streamlit app:
```bash
streamlit run recommendation_engine.py
3. Open the URL (usually `http://localhost:8501`) in your browser.
4. Enter a song name and get data-driven recommendations!
# π Data Features
### **Features in the Spotify Dataset**
- **Audio Features**:
- `Danceability`, `Energy`, `Acousticness`, `Instrumentalness`, `Loudness`, `Speechiness`, `Liveness`, `Valence`, `Tempo`
- **Popularity**: Reflects the songβs global reach.
- **Metadata**: Duration (in milliseconds), release year, song name, and artist.
### **Preprocessing**:
- Standardized numerical features for uniform scaling.
- Added a `decade` feature for exploratory analysis.

---
# πΌοΈ Visualizations
### **Clustering Visualization (PCA)**:
Clusters are visualized using **Principal Component Analysis (PCA)**, showing clear separations among song groups. Here's an example of the clustering output:

### **Clustering Visualization (t-SNE)**:
To further understand cluster separations, we used **t-SNE** for high-dimensional data visualization:

### **Streamlit Interface**:
A snapshot of the interactive Streamlit interface where users can input a song name and view recommendations:

### **Feature Correlations**:
A **correlation heatmap** highlights the relationships between features, enabling effective feature selection:

---
# π€ Data Science Techniques
### **Dimensionality Reduction**:
- PCA and t-SNE for reducing noise and improving cluster interpretability.
### **Unsupervised Learning**:
- KMeans for clustering songs based on similar features.
### **Similarity Metrics**:
- Cosine similarity for ranking song recommendations.
### **Model Evaluation**:
- **Silhouette Scores** and **Elbow Method** for validating clustering performance.
---
# π Key Insights
- **Data-Driven Approach**: Demonstrates the application of unsupervised learning for personalized recommendation systems.
- **Effective Clustering**: Clustering simplifies the recommendation process by narrowing down candidates.
- **Scalability**: Ready for integration with large-scale datasets or additional features.
- **Interactive Visualizations**: Improves transparency and interpretability of the modelβs results.
---
# π― Future Enhancements
### **Real-Time Data Integration**:
- Add more metadata or integrate with live Spotify API data in the future.
### **Advanced Models**:
- Explore deep learning techniques such as Autoencoders for feature extraction.
### **Enhanced Clustering**:
- Experiment with algorithms like DBSCAN or Hierarchical Clustering for better performance on sparse datasets.
---
# π‘ Contributions
Data enthusiasts and developers are welcome! Feel free to fork the repository and submit pull requests to enhance the project.
---