https://github.com/cintia0528/data_science-unsupervised_machine_learning

I aim to automate playlist creation for Moosic, a startup known for manual curation, using Machine Learning, while addressing skepticism about the ability of audio features to capture playlist "mood."
https://github.com/cintia0528/data_science-unsupervised_machine_learning

data data-preprocessing data-scaling data-science data-visualization datacleaning elbow-method kclustering machine-learning pandas python silhouette-score unsupervised-machine-learning

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/cintia0528/data_science-unsupervised_machine_learning
Owner: Cintia0528
Created: 2023-09-26T15:28:05.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-09-26T15:58:21.000Z (over 1 year ago)
Last Synced: 2025-03-31T05:35:18.383Z (3 months ago)
Topics: data, data-preprocessing, data-scaling, data-science, data-visualization, datacleaning, elbow-method, kclustering, machine-learning, pandas, python, silhouette-score, unsupervised-machine-learning
Language: Jupyter Notebook
Homepage:
Size: 2.18 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Unsupervised Machine Learning
## Goal
To evaluate whether Machine Learning can be used to automatise playlist creation.

## Overview
Moosic is a small startup that creates playlists curated manually by music experts. Their listeners love the personal touch, which they achieve by capturing the "mood" or "vibe".

_Board_: Believes that they need at least a **degree of automatisation**, as music experts are not able to keep up with the demand. **Currently** the whole creation process is **done manually**.

_Music Experts_: Are **skeptical** that audio features on their own are not enough to capture the "mood" which is very subjective that **only a human can judge**.

## Context
Moosic wants the data science team to use a dataset that has been collected from the Spotify API and contains the audio features (tempo, energy, danceability…) for a few thousand songs. After useing a basic **clustering algorithm** such as K-Means to divide the dataset into a few clusters the data team shall answer the following two questions:

1. *Are Spotify’s audio features able to identify “similar songs”, as defined by humanly detectable criteria?*
2. *Is K-Means a good method to create playlists?*

### Task:
* Import list of 5000 songs collected from Spotify API
* Use basic clustering ex.: K-Means to divide dataset into clusters
* Validate clusters, export clusters (playlists) to Spotify and listen to some of the songs

#### Challenges:
* Difficult to evaluate the results without listening to each playlist
* No tangible way to measure accuracy
* Unevenly large clusters
* Subjective - what is a good playlist?

#### Solutions:
* Must be visualized, so we can see the overlaps and the outliers
* Limit the number of features to 3 (or multiples of 3) so it can be visualized in 3D scatterplot
* Find a balance between K-score and the business objectives
* Instead of replacing music experts, ML does the "heavy lifting" and they fine-tune the results

## Approach
1. Evaluate the database; basic cleaning, ex.: missing, corrupted values, correct data types
2. Exploration of audio features
3. Decide which features to drop, and which features to use
4. K-Means clustering
5. Evaluation of clusters
6. Sub-clustering
7. Evaluation of final clusters

## Deliverables
5 minute **PowerPoint presentation** found [here](https://drive.google.com/file/d/1vUTZUToQtD97X_53d7Ht7nSnJrvjJ5G_/view?usp=sharing) to the Board of Directors, that summarizes the findings and suggests a course of action.
**Python code** is found [here](4_0_5000_songs_FINAL_NOTEBOOK.ipynb).

## Skills & Tools
1. Data Cleaning & Quality Assurance
2. Data Preprocessing: Scaling
3. K-Means Clustering
4. Elbow Method and Silhouette Score
5. Data Visualization (3D Scatterplot)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cintia0528/data_science-unsupervised_machine_learning

Awesome Lists containing this project

README