An open API service indexing awesome lists of open source software.

https://github.com/kyleseneker/cbb-hub

Open-source D1 college baseball landscape: live RPI, regional host projection, 64-team bracket projection, what-if result simulator. Python engine + TypeScript port + Next.js + GitHub Actions.
https://github.com/kyleseneker/cbb-hub

bracketology college-baseball d1-baseball ncaa ncaa-baseball nextjs open-source python rpi sports-analytics typescript vercel

Last synced: 8 days ago
JSON representation

Open-source D1 college baseball landscape: live RPI, regional host projection, 64-team bracket projection, what-if result simulator. Python engine + TypeScript port + Next.js + GitHub Actions.

Awesome Lists containing this project

README

          

# Field of 64

[![CI](https://github.com/kyleseneker/cbb-hub/actions/workflows/test.yml/badge.svg)](https://github.com/kyleseneker/cbb-hub/actions/workflows/test.yml)
[![Scrape](https://github.com/kyleseneker/cbb-hub/actions/workflows/scrape-and-commit.yml/badge.svg)](https://github.com/kyleseneker/cbb-hub/actions/workflows/scrape-and-commit.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

A year-round site for NCAA Division I baseball. Scoreboard, standings, RPI,
projected hosts and bracket, and a scenario simulator.

Live at [cbb-hub.vercel.app](https://cbb-hub.vercel.app).

## What's here

Rankings use the NCAA's published RPI from
[ncaa.com/rankings/baseball/d1/rpi](https://www.ncaa.com/rankings/baseball/d1/rpi).
Base RPI is also computed locally (adjusted WP1 with 1.3 / 0.7 venue
weighting, WP2 with self-game exclusion, OOWP-based WP3, D1-only) and
shown next to the published rank as a transparency column. The simulator
on `/scenarios` and `/path` recomputes base RPI in the browser when you
flip an upcoming game.

A Supabase Edge Function pulls NCAA's contests GraphQL API on a
10-minute cadence (pg_cron fires every minute; the function self-skips
between runs), writes a snapshot to Postgres, and the Next.js front-end
reads from the DB.

## Pages

| Path | What it shows |
| --- | --- |
| `/` | Yesterday's finals, today's slate, on-deck matchups; Top 25; phase-aware links. |
| `/scoreboard` | D1-vs-D1 scoreboard, paginated by date. |
| `/rankings` | Every D1 team, filterable by conference, with host (1-16) and at-large (17-64) pools highlighted. |
| `/standings` | Conference standings by league. |
| `/bubble` | Host line at #16 and at-large line at #64; last-four-in / first-four-out. |
| `/bracket` | Projected field of 64 via S-curve into 16 regionals. |
| `/conferences` | Per-conference averages, medians, projected hosts and bids. |
| `/scenarios` | Pick winners for upcoming games and watch RPI re-rank. |
| `/path` | Pick a team and a goal (host, field, top 25, custom rank); the solver finds the minimum games it takes. |
| `/team/[slug]` | RPI components, record splits, quality wins, bad losses, full schedule. |
| `/methodology` | Formula write-up and accuracy panel vs the published index. |
| `/api/rankings` | JSON: canonical-ranked teams with computed RPI components. |

## Local development

Setup happens once; see [supabase/SETUP.md](supabase/SETUP.md) (create
the Supabase project, apply migrations, deploy the Edge Function, set
env vars). Then:

```sh
cd web
npm install
npm run dev
npm test
npm run build
```

`web/.env.local` needs `NEXT_PUBLIC_SUPABASE_URL` and
`NEXT_PUBLIC_SUPABASE_ANON_KEY`.

The Python pipeline under `src/cbb_hub/` is the reference engine for
the RPI math. The parity test
[web/lib/__tests__/parity.test.ts](web/lib/__tests__/parity.test.ts)
runs the TypeScript engine against a frozen Python snapshot to catch
drift. To re-run the Python pipeline directly:

```sh
PYTHONPATH=src python3 -m cbb_hub scrape-ncaa --year 2026 \
--seed data/season-2026.json --output data/season-2026.json
PYTHONPATH=src python3 -m cbb_hub rank data/season-2026.json --top 25
PYTHONPATH=src python3 -m cbb_hub validate --season-file data/season-2026.json
python3 -m unittest discover tests
```

## Architecture

```
pg_cron (every 1 min, gated to 10 min cadence)


┌─────────────────────────────┐ sdataprod.ncaa.com
│ Supabase Edge Function │ ──────► GraphQL contests
│ supabase/functions/scrape │
│ │ ──────► ncaa.com/rankings
│ fetch, parse, compute RPI │ (published RPI)
│ apply_scrape_snapshot() │
└──────────────┬──────────────┘
│ atomic write

┌─────────────────────────────┐
│ Supabase Postgres │
│ teams, games, upcoming, │
│ rankings, auto_bids, │
│ scrape_runs │
└──────────────┬──────────────┘
│ anon SELECT (RLS)

┌─────────────────────────────┐
│ Next.js on Vercel │
└─────────────────────────────┘
```

Each scrape writes a full snapshot via `apply_scrape_snapshot`; readers
only see the latest run with `status = 'success'`, so partial writes
aren't visible.

The RPI math lives in three places that have to stay in step:
[src/cbb_hub/rpi.py](src/cbb_hub/rpi.py) (reference),
[supabase/functions/_shared/rpi.ts](supabase/functions/_shared/rpi.ts)
(Edge Function), and [web/lib/rpi.ts](web/lib/rpi.ts) (browser +
Next.js server). Python is the spec; the parity test pins the
TypeScript port to a frozen snapshot of Python's output. See
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full data-flow.

## RPI formula

NCAA D1 baseball, in effect since 2013:

RPI = 0.25 * WP1 + 0.50 * WP2 + 0.25 * WP3

- WP1: adjusted team winning percentage. Each W/L is weighted by venue:

| Outcome | Weight |
| --- | --- |
| Road win | 1.3 |
| Home loss | 1.3 |
| Home win | 0.7 |
| Road loss | 0.7 |
| Neutral W or L | 1.0 |

- WP2: average opponents' adjusted WP, with games against the target
team removed from each opponent's record.
- WP3: average opponents' opponents' WP (unadjusted).

Only D1-vs-D1 games count toward base RPI.

## Data source

NCAA.com for both games and the published RPI:

- Contests: the `sdataprod.ncaa.com` GraphQL endpoint returns every D1
contest for a given date with both teams, scores, `conferenceSeo`,
and a stable `contestId`. One HTTP call per calendar date.
- Published RPI: `ncaa.com/rankings/baseball/d1/rpi`.

`stats.ncaa.org` sits behind Akamai bot-detection that blocks
server-side fetches; the GraphQL and rankings paths above don't need
auth.

The contests endpoint uses a persisted-query SHA256 hash embedded in
the scoreboard page's `drupalSettings`. The hash rotates whenever NCAA
redeploys; the scraper re-reads it from the page each run.

## Project layout

```
src/cbb_hub/
models.py Game and Team dataclasses
rpi.py RPI calculator (reference)
io.py season JSON load/save
fetcher.py rate-limited, disk-cached HTTP client
ncaa_rpi.py NCAA RPI ranking page parser
ncaa_contests.py NCAA GraphQL contests scraper
tournaments.py conference tournament bracket types
score_patches.py manual score corrections
cli.py argparse entry point
tests/
fixtures/ pinned HTML / JSON snapshots
supabase/
functions/scrape/ Edge Function (live scraper)
functions/_shared/ shared TS modules used by the function
migrations/ schema + RPC migrations
web/
app/ Next.js App Router pages
components/ UI primitives and page components
lib/ season-state, rpi, bracket, field, path
docs/ ARCHITECTURE.md and other contributor docs
data/ local dev fallback JSON
.cache/ gitignored; HTML cached here for the Python CLI
```

## License

MIT.