https://github.com/fwextensions/sf-pools

ai built-with-ai

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/fwextensions/sf-pools
Owner: fwextensions
License: mit
Created: 2025-09-20T18:33:06.000Z (10 months ago)
Default Branch: main
Last Pushed: 2026-01-17T03:09:01.000Z (6 months ago)
Last Synced: 2026-01-17T14:36:45.190Z (6 months ago)
Topics: ai, built-with-ai
Language: TypeScript
Homepage: https://sf-pools.vercel.app
Size: 337 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# SF Pools Schedule Viewer

Centralized, searchable schedules for San Francisco public swimming pools. This app scrapes official pool schedule PDFs from SF Rec & Park, uses an LLM to extract structured schedules, and provides a clean UI to browse by program, day, time, and pool.

## Tech

- Next.js (App Router), React 19, TypeScript
- Tailwind CSS v4 (via `@tailwindcss/postcss` and `@import "tailwindcss"`)
- Vercel AI SDK (`ai`) with Google Generative AI provider (`@ai-sdk/google`)
- Zod for strict schema validation
- GitHub Actions for automated weekly schedule updates

## Prerequisites

- Node.js 22+
- npm
- A Google Generative AI API key

## Setup

1. Install dependencies:

```
npm install
```

2. Create `.env.local` in the project root and add:

```
GOOGLE_GENERATIVE_AI_API_KEY=your_key_here
```

3. Run the dev server:

```
npm run dev
```

4. Generate schedules:

```
npm run build-schedules
```

This will scrape PDF URLs, download PDFs, extract schedules, and write `public/data/all_schedules.json`. View at `/schedules`.

## Architecture

### Data Files

- **`data/pools.json`** — Source of truth for static pool metadata (id, name, shortName, address, pageUrl). Does not change frequently.
- **`public/data/discovered_pool_schedules.json`** — Scraped PDF URLs (poolId → pdfUrl mapping). Regenerated on each scrape.
- **`data/pdf-manifest.json`** — Tracks downloaded PDFs by hash to detect changes.
- **`data/extracted/.json`** — Cached LLM extractions per PDF.
- **`public/data/all_schedules.json`** — Aggregated schedule data for the UI.
- **`data/changelog/`** — Change history between schedule updates.

### Pipeline Flow

```
scrape → download-pdfs → process-all-pdfs
↓ ↓ ↓
validates checks hash preserves data
pool count downloads if fails on large
& URLs content changed changes
```

1. **Scrape**: Discovers pool pages and PDF URLs from SF Rec & Park. Validates against `pools.json` — fails if pool count or page URLs change unexpectedly.
2. **Download**: Fetches PDFs, checks content hash against manifest. Only downloads if content actually changed (handles URL changes gracefully).
3. **Process**: Extracts schedules via LLM, preserves unchanged pool schedules, detects large changes.

### Change Detection

The pipeline computes a changelog comparing old vs new schedules:
- **none/minor**: Normal updates, build succeeds
- **major/wholesale**: Large changes detected, build fails in CI (requires manual review)

Set `FAIL_ON_LARGE_CHANGES=false` locally to bypass this check during development.

## Scripts

- `npm run dev` — start Next.js dev server
- `npm run build` — build for production
- `npm run start` — start production build
- `npm run lint` — run ESLint
- `npm run test` — run tests
- `npm run scrape` — scrape pool pages, validate against pools.json, discover PDF URLs
- `npm run download-pdfs` — download changed PDFs into `data/pdfs/`
- `npm run process-all-pdfs` — extract schedules from PDFs, preserve unchanged, write changelog
- `npm run build-schedules` — full pipeline: scrape → download → process
- `npm run scrape-alerts` — scrape pool alerts from SF Rec & Park
- `npm run analyze-programs` — analyze raw vs canonical program names

## How Extraction Works

1. For each PDF, send content to the LLM with a strict Zod schema
2. The model extracts:
- Pool metadata (name, season, date range)
- Program entries with day, time, lanes, notes
3. Pipeline normalizes program names to canonical labels (e.g., "LAP SWIM" → "Lap Swim")
4. Enriches with static metadata from `pools.json` (address, URLs)

## Automated Updates

GitHub Actions runs weekly to:
1. Scrape and download new PDFs
2. Extract schedules from changed PDFs
3. Commit changes to `public/data/` and `data/changelog/`
4. Send push notifications via Pushover

If large changes are detected, the build fails and a notification is sent for manual review.

## Environment Variables

```bash
# Required: Google AI API key for schedule extraction
GOOGLE_GENERATIVE_AI_API_KEY=your_key_here

# Optional: Disable build failure on large changes (for local dev)
FAIL_ON_LARGE_CHANGES=false

# Optional: Force re-extraction even if cache exists
REFRESH_EXTRACT=1

# For CI notifications (GitHub Actions secrets)
PUSHOVER_USER_KEY=...
PUSHOVER_API_TOKEN=...
```

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fwextensions/sf-pools

Awesome Lists containing this project

README