https://github.com/ttozatto/sparkify
Churn Prediction for music streaming app with PySpark
https://github.com/ttozatto/sparkify
analysis churn data learning machine predictive pyspark science spark
Last synced: about 1 month ago
JSON representation
Churn Prediction for music streaming app with PySpark
- Host: GitHub
- URL: https://github.com/ttozatto/sparkify
- Owner: ttozatto
- Created: 2022-08-22T03:21:09.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-08-23T02:41:04.000Z (over 3 years ago)
- Last Synced: 2024-01-29T03:42:19.234Z (about 2 years ago)
- Topics: analysis, churn, data, learning, machine, predictive, pyspark, science, spark
- Language: Jupyter Notebook
- Homepage: https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f
- Size: 142 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sparkify - Churn Prediction for music streaming app with PySpark
This repository is part of the final project submited to Udacity for the Data Science Nanodegree.
The objective is to predict churn, from a simulated music streaming app, using historical data from user interactions.
A blog post with a detailed analysis is available at https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f
## Dependencies
- pyspark
- matplotlib
## Files
- utils.py -> function to load and treat data, create, train and evaluate ML models
- main.py -> script to run the full process, from loading the dataset to showing results
- medium-sparkify-event-data.json -> dataset with user interactions in the app. Available at: https://video.udacity-data.com/topher/2018/December/5c1d6681_medium-sparkify-event-data/medium-sparkify-event-data.json
- Sparkify.ipynb -> Initial exploratory analysis. Final modeling and tuning were done in the 2 scripts listed above.
## Summary of Results
### Test Scores

### Parameters for best models

### Feature importance

## Aknowledgements:
I would like to pay my special regards to:
- Udacity, that proposed this work in the Data Science Nanodegree.
- Spark team and community, that provides a powerful opensource tool to everyone.