https://github.com/ttozatto/sparkify

Churn Prediction for music streaming app with PySpark
https://github.com/ttozatto/sparkify

analysis churn data learning machine predictive pyspark science spark

Last synced: 5 months ago
JSON representation

Churn Prediction for music streaming app with PySpark

Host: GitHub
URL: https://github.com/ttozatto/sparkify
Owner: ttozatto
Created: 2022-08-22T03:21:09.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-08-23T02:41:04.000Z (almost 4 years ago)
Last Synced: 2024-01-29T03:42:19.234Z (over 2 years ago)
Topics: analysis, churn, data, learning, machine, predictive, pyspark, science, spark
Language: Jupyter Notebook
Homepage: https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f
Size: 142 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Sparkify - Churn Prediction for music streaming app with PySpark

This repository is part of the final project submited to Udacity for the Data Science Nanodegree.
The objective is to predict churn, from a simulated music streaming app, using historical data from user interactions.

A blog post with a detailed analysis is available at https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f

## Dependencies
- pyspark
- matplotlib

## Files
- utils.py -> function to load and treat data, create, train and evaluate ML models
- main.py -> script to run the full process, from loading the dataset to showing results
- medium-sparkify-event-data.json -> dataset with user interactions in the app. Available at: https://video.udacity-data.com/topher/2018/December/5c1d6681_medium-sparkify-event-data/medium-sparkify-event-data.json
- Sparkify.ipynb -> Initial exploratory analysis. Final modeling and tuning were done in the 2 scripts listed above.

## Summary of Results
### Test Scores
![results_medium](https://user-images.githubusercontent.com/42552721/186053626-a014429d-c66c-485e-a418-b13b04d0345f.PNG)
### Parameters for best models
![bestModel](https://user-images.githubusercontent.com/42552721/186053668-d368dba2-c46e-419d-895e-f1e9ca88d1b5.PNG)
### Feature importance
![feature_importance](https://user-images.githubusercontent.com/42552721/186053678-ec77f392-a8b0-4134-9fbb-fa36dd1b19ae.png)

## Aknowledgements:
I would like to pay my special regards to:
- Udacity, that proposed this work in the Data Science Nanodegree.
- Spark team and community, that provides a powerful opensource tool to everyone.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ttozatto/sparkify

Awesome Lists containing this project

README