{"id":37615746,"url":"https://github.com/ttozatto/sparkify","last_synced_at":"2026-01-16T10:30:39.754Z","repository":{"id":70623711,"uuid":"527396330","full_name":"ttozatto/sparkify","owner":"ttozatto","description":"Churn Prediction for music streaming app with PySpark","archived":false,"fork":false,"pushed_at":"2022-08-23T02:41:04.000Z","size":145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-01-29T03:42:19.234Z","etag":null,"topics":["analysis","churn","data","learning","machine","predictive","pyspark","science","spark"],"latest_commit_sha":null,"homepage":"https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ttozatto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-08-22T03:21:09.000Z","updated_at":"2022-08-23T02:40:03.000Z","dependencies_parsed_at":"2023-04-27T04:03:09.608Z","dependency_job_id":null,"html_url":"https://github.com/ttozatto/sparkify","commit_stats":{"total_commits":7,"total_committers":2,"mean_commits":3.5,"dds":0.4285714285714286,"last_synced_commit":"a57d0c340ad5a8d77ca825819e2676bc08d37be2"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ttozatto/sparkify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttozatto%2Fsparkify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttozatto%2Fsparkify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttozatto%2Fsparkify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttozatto%2Fsparkify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ttozatto","download_url":"https://codeload.github.com/ttozatto/sparkify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttozatto%2Fsparkify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","churn","data","learning","machine","predictive","pyspark","science","spark"],"created_at":"2026-01-16T10:30:38.895Z","updated_at":"2026-01-16T10:30:39.645Z","avatar_url":"https://github.com/ttozatto.png","language":"Jupyter Notebook","readme":"# Sparkify - Churn Prediction for music streaming app with PySpark\n\nThis repository is part of the final project submited to Udacity for the Data Science Nanodegree.\nThe objective is to predict churn, from a simulated music streaming app, using historical data from user interactions.\n\nA blog post with a detailed analysis is available at https://medium.com/@ttozatto.ds/churn-prediction-for-music-streaming-app-sparkify-d6e26d1ac80f\n\n## Dependencies\n  - pyspark\n  - matplotlib\n  \n ## Files\n  - utils.py -\u003e function to load and treat data, create, train and evaluate ML models\n  - main.py -\u003e script to run the full process, from loading the dataset to showing results\n  - medium-sparkify-event-data.json -\u003e dataset with user interactions in the app. Available at: https://video.udacity-data.com/topher/2018/December/5c1d6681_medium-sparkify-event-data/medium-sparkify-event-data.json\n  - Sparkify.ipynb -\u003e Initial exploratory analysis. Final modeling and tuning were done in the 2 scripts listed above.\n  \n ## Summary of Results\n ### Test Scores\n ![results_medium](https://user-images.githubusercontent.com/42552721/186053626-a014429d-c66c-485e-a418-b13b04d0345f.PNG)\n ### Parameters for best models\n ![bestModel](https://user-images.githubusercontent.com/42552721/186053668-d368dba2-c46e-419d-895e-f1e9ca88d1b5.PNG)\n ### Feature importance\n![feature_importance](https://user-images.githubusercontent.com/42552721/186053678-ec77f392-a8b0-4134-9fbb-fa36dd1b19ae.png)\n\n \n ## Aknowledgements:\nI would like to pay my special regards to:\n  - Udacity, that proposed this work in the Data Science Nanodegree.\n  - Spark team and community, that provides a powerful opensource tool to everyone.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fttozatto%2Fsparkify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fttozatto%2Fsparkify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fttozatto%2Fsparkify/lists"}