https://github.com/preston-bernstein/nba-predictor
NBA game outcome predictor built with Python, pandas, and scikit-learn. Early baseline model using rolling team stats.
https://github.com/preston-bernstein/nba-predictor
basketball fastapi machine-learning sklearn sports-analytics
Last synced: about 2 months ago
JSON representation
NBA game outcome predictor built with Python, pandas, and scikit-learn. Early baseline model using rolling team stats.
- Host: GitHub
- URL: https://github.com/preston-bernstein/nba-predictor
- Owner: preston-bernstein
- Created: 2025-09-04T21:55:33.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-28T15:12:12.000Z (9 months ago)
- Last Synced: 2025-09-28T15:17:18.608Z (9 months ago)
- Topics: basketball, fastapi, machine-learning, sklearn, sports-analytics
- Language: Python
- Homepage:
- Size: 134 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NBA predictor
[](https://github.com/preston-bernstein/nba-predictor/actions/workflows/ci.yaml) [](https://codecov.io/gh/preston-bernstein/nba-predictor) [](pyproject.toml) [](LICENSE)
Python 3.12 pipeline that scrapes Basketball-Reference, builds features, trains a classifier, and serves win probabilities through FastAPI. Cached CSVs in `data_cache/` and artifacts in `artifacts/` drive all predictions; refresh them with the pipeline before serving.
## Setup
Create a virtual environment and install dev dependencies:
```bash
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e '.[dev]'
```
## Pipeline
Run everything end to end:
```bash
make pipeline
```
Or run each step:
- Fetch: `make fetch` (or `python -m src.data.fetch --seasons "2024 2025"`) → `data_cache/games.csv`.
- Features: `make features` → rolling form, rest days, and Elo deltas in `data_cache/features.csv`.
- Train: `make train MODELS="logreg rf"` → best model at `artifacts/model.joblib` with metrics in `artifacts/metrics.json`.
Use `OFFLINE=1` to seed from fixtures. Add `PRESERVE=1` to keep existing caches. Control seasons and model lists with `SEASONS` and `MODELS`.
## API
Start the service:
```bash
make serve
# uvicorn src.service.app:app --reload --port 5000
```
Endpoints:
- `GET /v1/health` → `{"ok": true}`
- `GET /v1/teams` → canonical team codes from cached games
- `GET /v1/predict?home=NYK&away=BOS&date=2025-01-01` → win probability and feature deltas
Team inputs accept codes, full names, and common aliases. Unknown teams return HTTP 422 with a clear message. The service reads artifacts only; regenerate them before deploying.
## Tests and QA
Common checks:
```bash
make lint
make type
make test
make test-cov
make check # fmt + lint + type + coverage gate
```
Coverage runs produce HTML and XML reports. Use `make cov-open`, `make test-cov-gaps`, or `make test-cov-diff` for deeper inspection.
## Layout
```
src/
config.py # path config for data and artifacts
data/ # scraping and feature engineering
model/ # training and model selection
service/ # FastAPI app, routes, schemas, normalizer
utils/ # logging helpers
tests/ # unit and integration tests (fixtures included)
artifacts/ # generated model + metrics
data_cache/ # generated games and features CSVs
Makefile # pipeline, QA, and serve targets
pyproject.toml # deps and tooling config
```
Data comes from Basketball-Reference. Treat `data_cache/` and `artifacts/` as generated outputs; rebuild them after code or schema changes.