{"id":50859536,"url":"https://github.com/nayutalienx/osu-skill-predictor","last_synced_at":"2026-06-14T20:34:39.492Z","repository":{"id":363245047,"uuid":"1254043423","full_name":"nayutalienx/osu-skill-predictor","owner":"nayutalienx","description":"ML-powered osu! pass probability \u0026 accuracy predictor with real-time overlay. Standalone Windows bundle available.","archived":false,"fork":false,"pushed_at":"2026-06-08T03:26:13.000Z","size":16940,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-14T20:34:35.439Z","etag":null,"topics":["fastapi","machine-learning","osu","overlay","predictor","scikit-learn"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nayutalienx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-30T04:37:51.000Z","updated_at":"2026-06-08T03:26:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nayutalienx/osu-skill-predictor","commit_stats":null,"previous_names":["nayutalienx/osu-skill-predictor"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/nayutalienx/osu-skill-predictor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nayutalienx%2Fosu-skill-predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nayutalienx%2Fosu-skill-predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nayutalienx%2Fosu-skill-predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nayutalienx%2Fosu-skill-predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nayutalienx","download_url":"https://codeload.github.com/nayutalienx/osu-skill-predictor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nayutalienx%2Fosu-skill-predictor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34337551,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","machine-learning","osu","overlay","predictor","scikit-learn"],"created_at":"2026-06-14T20:34:38.789Z","updated_at":"2026-06-14T20:34:39.484Z","avatar_url":"https://github.com/nayutalienx.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# osu-skill-predictor\n\n`osu-skill-predictor` is a small classical ML project that predicts:\n\n- the probability that a player will pass an osu! standard beatmap;\n- the expected accuracy percentage for that attempt.\n\n## Quick Start for Players\n\n\u003e **Want predictions while you play?** Grab the standalone bundle — no Python needed.\n\n**[Player Guide \u0026mdash; download, setup, overlay →](docs/standalone_player_guide.md)**\n\n1. Download `osu-skill-predictor-web.zip` from the [latest release](https://github.com/nayutalienx/osu-skill-predictor/releases/latest)\n2. Extract and run `osu-skill-predictor-web.exe`\n3. Enter your osu! API v2 credentials, then save\n4. Enable the overlay in settings (off by default)\n\nAn always-on-top overlay shows pass probability and predicted accuracy for every beatmap. Overlay works in **windowed or borderless** mode.\n\n![Overlay preview](docs/assets/predictor-overlay.png)\n\n---\n\nThe project is intentionally shaped like a lightweight production ML service:\n\n- local dataset collection and training workflow;\n- serialized scikit-learn model artifacts;\n- FastAPI inference API;\n- automated tests;\n- notebook support for interactive comparison and training.\n\n## Current Status\n\nImplemented:\n\n- dataset collection and profiling;\n- baseline feature engineering;\n- grouped holdout and grouped cross-validation model comparison;\n- canonical saved winner models in `models/`;\n- FastAPI `GET /health` and `POST /predict`;\n- automated tests for features, model loading, comparison, and API endpoints.\n\nCurrent canonical models:\n\n- classifier: `RandomForestClassifier`\n- regressor: `HistGradientBoostingRegressor`\n\nThese were selected from the comparison workflow and saved as:\n\n- `models/pass_model.joblib`\n- `models/accuracy_model.joblib`\n\n## Dataset\n\nThe current training dataset is a real API-backed `osu!` standard attempt dataset collected from the osu! API v2.\n\nKey properties:\n\n- source file: `data/raw/osu_country_try_data_full_20260601T074107Z/osu_country_try_data_v1.csv`\n- row granularity: one row = one observed player attempt on one beatmap\n- ruleset scope: `osu` standard only\n- collection strategy: country-seeded sampling with recent and top score pulls per sampled player\n- cleaned modeling rows: `184,229`\n- raw loaded rows: `184,615`\n- unique users: `9,999`\n- unique beatmaps: `35,261`\n\nThe dataset was built to support two targets:\n\n- `target_passed` for pass/fail classification\n- `target_accuracy` for regression of expected accuracy percentage\n\nFor more detail on provenance and schema decisions, see:\n\n- [docs/data_provenance.md](docs/data_provenance.md)\n- [docs/dataset_schema.md](docs/dataset_schema.md)\n- [docs/raw_data_validation.md](docs/raw_data_validation.md)\n\n## Model Comparison\n\nModel selection is not based on a single training run. The project includes both:\n\n- grouped holdout evaluation by `user_id`\n- grouped cross-validation for a more stable comparison view\n\nThe current saved winners follow the stronger grouped cross-validation view:\n\n- classifier winner: `RandomForestClassifier`\n- regressor winner: `HistGradientBoostingRegressor`\n\nWhy these won:\n\n- `RandomForestClassifier` was the most reliable classifier under grouped cross-validation, with the best mean `PR AUC` and stronger separation than the other candidates.\n- `HistGradientBoostingRegressor` won both holdout and grouped cross-validation on regression quality while also training much faster than the random forest regressor.\n\n### Classifier Comparison\n\n![Classifier comparison](docs/assets/readme_classifier_comparison.png)\n\n### Regressor Comparison\n\n![Regressor comparison](docs/assets/readme_regressor_comparison.png)\n\n### Cross-Validation Tradeoff View\n\n![Cross-validation tradeoff](docs/assets/readme_cv_tradeoff.png)\n\nThe practical takeaway is:\n\n- classifier choice is close, so grouped CV matters more than a single split\n- regressor choice is stable, and `HistGradientBoostingRegressor` is clearly the strongest default\n- fit time is already good enough for local retraining and interactive notebook work\n\n## Feature Importances\n\nFeature importances extracted from the baseline model pipeline (`notebooks/02_baseline_model.ipynb`).\n\n### Classifier (Pass Probability)\n\n![Classifier feature importances](docs/classifier_importances.png)\n\nThe classifier relies most on `beatmap_passcount`, `beatmap_playcount`, and `user_pp`.  \nMod features (`has_hidden`, `has_hardrock`, `has_doubletime`) have relatively low influence on pass prediction.\n\n## Quick Start\n\nInstall the main runtime dependencies:\n\n```powershell\npython -m pip install fastapi \"uvicorn[standard]\" pandas scikit-learn joblib pyarrow matplotlib jupyterlab\n```\n\nStart the API:\n\n```powershell\nuvicorn app.main:app --reload\n```\n\nThen check:\n\n- `http://127.0.0.1:8000/health`\n- `http://127.0.0.1:8000/docs`\n\nRun tests:\n\n```powershell\npython -m unittest discover -s tests -v\n```\n\n## Main Workflows\n\n### Train and save winner models\n\nRun the comparison workflow and save canonical model artifacts:\n\n```powershell\npython -m ml.compare --evaluation-mode cross_validation --cv-folds 5 --save-winners --models-root models\n```\n\n### Run the API locally\n\n```powershell\nuvicorn app.main:app --reload\n```\n\n### Use the comparison notebook\n\nOpen:\n\n- `notebooks/03_model_comparison.ipynb`\n\nThis notebook supports:\n\n- holdout comparison;\n- grouped cross-validation comparison;\n- visual comparison plots;\n- saving the chosen winner models to `models/`.\n\n## Repository Structure\n\n```text\napp/        FastAPI service, schemas, and inference code\ndata/       sample, raw, and processed datasets\ndocs/       project docs, model docs, and run instructions\nml/         feature engineering, training, evaluation, and comparison logic\nmodels/     canonical serialized model artifacts\nnotebooks/  interactive collection, training, and comparison notebooks\nscripts/    dataset collection scripts\ntests/      automated unit and API tests\n```\n\n## Core Docs\n\n- [docs/setup.md](docs/setup.md)\n- [docs/training.md](docs/training.md)\n- [docs/api_usage.md](docs/api_usage.md)\n- [docs/model_card.md](docs/model_card.md)\n- [docs/assumptions_and_limitations.md](docs/assumptions_and_limitations.md)\n- [docs/local_run_instructions.md](docs/local_run_instructions.md)\n- [docs/api_error_handling.md](docs/api_error_handling.md)\n\n## Scope\n\nThis is an MVP for local experimentation and portfolio review.\n\nIt is not intended to be:\n\n- a production osu! recommendation platform;\n- a live ranking-grade inference service;\n- a deep-learning or replay-driven system.\n\n## Acknowledgments\n\nThis project is powered by **[tosu](https://github.com/tosuapp/tosu)** — an open-source osu! memory reader that provides real-time beatmap and gameplay data. Thank you to the tosu team for making local osu! tooling possible.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnayutalienx%2Fosu-skill-predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnayutalienx%2Fosu-skill-predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnayutalienx%2Fosu-skill-predictor/lists"}