{"id":31986647,"url":"https://github.com/troutlytics/troutlytics-backend","last_synced_at":"2026-03-07T03:32:24.126Z","repository":{"id":64969298,"uuid":"495483857","full_name":"troutlytics/troutlytics-backend","owner":"troutlytics","description":"Backend support to provide updated information about trout stocking in Washington state. This repository contains all the essential backend components of the project, including database management, Web API, and a web scraper.","archived":false,"fork":false,"pushed_at":"2026-02-27T18:51:00.000Z","size":8045,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-27T22:58:34.140Z","etag":null,"topics":["contributions-welcome","contributors","data-visualization","docker","fastapi","fish","fishing","folium","lakes","maps","python","sqlalchemy","statistics","trout","washington","wdfw","webscraper","webscraping","website"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/troutlytics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-05-23T16:11:36.000Z","updated_at":"2026-02-27T18:51:05.000Z","dependencies_parsed_at":"2023-07-16T13:12:03.865Z","dependency_job_id":"e55a91f7-e23e-4010-802c-976f38ff6a78","html_url":"https://github.com/troutlytics/troutlytics-backend","commit_stats":{"total_commits":403,"total_committers":3,"mean_commits":"134.33333333333334","dds":0.02481389578163773,"last_synced_commit":"656b18b807f1ce1578fed2ee9e486583f10fb31b"},"previous_names":["thomas-basham/washington-trout-stats","thomas-basham/trout-finder","thomas-basham/trout-tracker-wa-backend","troutlytics/troutlytics-backend"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/troutlytics/troutlytics-backend","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/troutlytics%2Ftroutlytics-backend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/troutlytics%2Ftroutlytics-backend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/troutlytics%2Ftroutlytics-backend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/troutlytics%2Ftroutlytics-backend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/troutlytics","download_url":"https://codeload.github.com/troutlytics/troutlytics-backend/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/troutlytics%2Ftroutlytics-backend/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30206576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T03:24:23.086Z","status":"ssl_error","status_checked_at":"2026-03-07T03:23:11.444Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contributions-welcome","contributors","data-visualization","docker","fastapi","fish","fishing","folium","lakes","maps","python","sqlalchemy","statistics","trout","washington","wdfw","webscraper","webscraping","website"],"created_at":"2025-10-15T07:26:21.002Z","updated_at":"2026-03-07T03:32:24.118Z","avatar_url":"https://github.com/troutlytics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Troutlytics Backend\n\nBackend data platform for Troutlytics.\n\nThis repository does two things:\n\n1. Scrapes Washington Department of Fish and Wildlife (WDFW) trout stocking reports.\n2. Serves cleaned stocking data through a FastAPI service for dashboards, maps, and analytics.\n\n## Project Links\n\n- Website: https://troutlytics.com\n- Backend repository: https://github.com/troutlytics/troutlytics-backend\n- API base URL: https://xtczssso08.execute-api.us-west-2.amazonaws.com\n\n## Problem This Project Solves\n\nWDFW stocking information is published as a human-readable web table, but product features need a reliable machine-readable dataset. The raw source is hard to query over time, aggregate by hatchery/date, and map consistently.\n\nThis backend solves that by:\n\n1. Extracting the source table on a schedule.\n2. Normalizing and deduplicating stocking rows.\n3. Resolving water bodies to stable location records.\n4. Persisting curated data in a queryable database.\n5. Exposing API endpoints with caching/ETag support for fast client reads.\n\n## How It Works\n\n1. `web_scraper/scraper.py` fetches WDFW trout plant rows (`items_per_page=250`), parses fields, and normalizes names/date/values.\n2. The scraper matches each row to existing `water_location` records (exact + relaxed matching) to avoid duplicates.\n3. If enabled, it can create missing water locations and optionally geocode them with Google Geocoding.\n4. `data/database.py` writes `stocking_report` entries with dedupe/upsert logic and records run metadata in `utility`.\n5. `api/index.py` serves raw and aggregate endpoints from the same shared models in `data/`.\n\n## Repository Layout\n\n```text\n.\n├── api/                    FastAPI app (local Uvicorn + Lambda via Mangum)\n├── web_scraper/            WDFW scraper and parser\n├── data/                   SQLAlchemy models, database access layer, local SQLite file\n├── aws_config/             CloudFormation templates (OIDC/IAM, scheduled Fargate, full stack variants)\n├── .github/workflows/      CI and image deployment workflows\n├── docker-compose.yml      Local services for API and scraper\n└── Makefile                ECR/Lambda helper commands\n```\n\n## Runtime Architecture\n\n- API: FastAPI + SQLAlchemy (`api/index.py`)\n- Scraper: Requests + BeautifulSoup (`web_scraper/scraper.py`)\n- Database:\n  - Production path: PostgreSQL/Aurora via `POSTGRES_*` env vars\n  - Local fallback: `data/sqlite.db` when Postgres vars are missing\n- Deploy targets:\n  - API container image for AWS Lambda (`api/dockerfiles/prod/Dockerfile`)\n  - Scraper container image for ECS Fargate (`web_scraper/Dockerfile`)\n\n## Environment Variables\n\nCore database variables:\n\n- `POSTGRES_HOST`\n- `POSTGRES_PORT` (default `5432`)\n- `POSTGRES_DB`\n- `POSTGRES_USER`\n- `POSTGRES_PASSWORD`\n\nScraper behavior flags:\n\n- `SCRAPER_ALLOW_CREATE_WATER_LOCATION` (`true/false`, default `false`)\n- `SCRAPER_GEOCODE` (`true/false`, default `false`)\n- `GV3_API_KEY` (required only when geocoding is enabled)\n\n## Local Development (2 options)\n\n### Option 1: Docker Compose\n\nFrom repo root:\n\n```bash\ndocker compose build\ndocker compose up\n```\n\nUseful targets:\n\n- API dev service only: `docker compose up api-dev`\n- API prod image locally: `docker compose up api-prod`\n- Scraper only: `docker compose up web-scraper`\n\n### Option 2: Python Directly\n\nFrom repo root:\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -r web_scraper/requirements.txt\npip install -r api/requirements.txt\n```\n\nRun scraper:\n\n```bash\npython -m web_scraper.scraper\n```\n\nRun API:\n\n```bash\nuvicorn api.index:app --reload --port 8080\n```\n\nAPI docs: `http://localhost:8080/docs`\n\n## API Endpoints\n\nMain routes in `api/index.py`:\n\n- `GET /`\n- `GET /stocked_lakes_data`\n- `GET /stocked_lakes_data_all_time`\n- `GET /total_stocked_by_date_data`\n- `GET /hatchery_totals`\n- `GET /derby_lakes_data`\n- `GET /date_data_updated`\n- `GET /hatchery_names`\n\nDate-filtered endpoints accept optional `start_date` and `end_date` (ISO format). Defaults are last 7 days.\n\n## CI/CD and Deployment\n\n- `.github/workflows/deploy-scraper.yml`\n  - On push to `main`, builds and pushes both scraper and API images to ECR.\n  - Uses GitHub OIDC (`AWS_ROLE_ARN`) to assume an AWS role.\n- `.github/workflows/python-app.yml`\n  - Lints with flake8 and runs pytest.\n- `.github/workflows/deploy-to-ecr.yml`\n  - Additional scraper image push workflow (also on `main`).\n\nInfra templates:\n\n- `aws_config/configure-aws-credentials-latest.yml`: IAM role/OIDC provider setup for GitHub Actions.\n- `aws_config/scheduled-scraper-fargate.yaml`: Scheduled EventBridge -\u003e ECS Fargate scraper task using Secrets Manager.\n- `aws_config/full-api-creation.yaml` and `aws_config/fargate-rds-secrets.yaml`: broader/legacy stack templates kept in-repo.\n\n## Notes and Constraints\n\n- If Postgres env vars are missing, the app falls back to SQLite (`data/sqlite.db`).\n- Scraper write behavior is intentionally conservative by default: if a water location does not already exist and create mode is off, the row is skipped.\n- API responses use cache headers and ETags; the all-time route keeps an in-memory cache (~12 hours) to reduce query cost.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftroutlytics%2Ftroutlytics-backend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftroutlytics%2Ftroutlytics-backend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftroutlytics%2Ftroutlytics-backend/lists"}