{"id":50733406,"url":"https://github.com/timfanda35/aws-pricing-list-loader","last_synced_at":"2026-06-10T11:01:37.953Z","repository":{"id":354655149,"uuid":"1222723630","full_name":"timfanda35/aws-pricing-list-loader","owner":"timfanda35","description":"Load AWS Pricing List to Postgres","archived":false,"fork":false,"pushed_at":"2026-05-10T06:57:15.000Z","size":286,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-10T08:40:45.224Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timfanda35.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-27T16:38:36.000Z","updated_at":"2026-05-10T06:57:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/timfanda35/aws-pricing-list-loader","commit_stats":null,"previous_names":["timfanda35/aws-pricing-list-loader"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/timfanda35/aws-pricing-list-loader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timfanda35%2Faws-pricing-list-loader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timfanda35%2Faws-pricing-list-loader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timfanda35%2Faws-pricing-list-loader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timfanda35%2Faws-pricing-list-loader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timfanda35","download_url":"https://codeload.github.com/timfanda35/aws-pricing-list-loader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timfanda35%2Faws-pricing-list-loader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34149132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-10T11:01:37.223Z","updated_at":"2026-06-10T11:01:37.939Z","avatar_url":"https://github.com/timfanda35.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AWS Pricing List Loader\n\nCrawls the [AWS Pricing API](https://pricing.us-east-1.amazonaws.com) to discover all service and savings plan pricing URLs, and bulk-loads every region's pricing CSV into PostgreSQL using an ingestion/swap pattern.\n\nExposed as a FastAPI HTTP service, with a CLI for local use and a Cloud Run Job entry point for batch execution.\n\n## Setup\n\n### Docker (recommended)\n\n```bash\ncp .env.example .env  # fill in Postgres credentials\ndocker compose up --build -d\n```\n\nStarts both PostgreSQL and the API. The API automatically runs any pending DB migrations on startup. Interactive docs at `http://localhost:8000/docs`.\n\n### Docker with SSL (GCP Cloud SQL or local mTLS test)\n\nFor GCP Cloud SQL, download `server-ca.pem`, `client-cert.pem`, and `client-key.pem` from the Console → Cloud SQL → your instance → Connections → SSL, and place them in `certs/`.\n\nFor local testing, generate self-signed equivalents instead:\n\n```bash\nbash scripts/gen-dev-certs.sh           # creates certs/ with matching file names\ndocker compose -f docker-compose.yml -f docker-compose.ssl.yml up --build -d\n```\n\nThe SSL compose override enables TLS on the PostgreSQL container and injects these env vars into `api`:\n\n| Env var | Value (in container) |\n|---|---|\n| `POSTGRES_SSL_MODE` | `verify-ca` |\n| `POSTGRES_SSL_ROOTCERT` | `/app/certs/server-ca.pem` |\n| `POSTGRES_SSL_CERT` | `/app/certs/client-cert.pem` |\n| `POSTGRES_SSL_KEY` | `/app/certs/client-key.pem` |\n\nFor local dev without Docker, set these vars in `.env` pointing to your local cert paths.\n\n### Local development\n\n```bash\npip install -r requirements-dev.txt\ncp .env.example .env  # fill in Postgres credentials\ndocker compose up -d db  # start Postgres only\nuvicorn app.main:app --reload  # migrations run automatically on startup\n```\n\n## Environment variables\n\nAll vars are read from `.env` (or the shell environment). Copy `.env.example` to get started.\n\n| Variable | Default | Required | Description |\n|---|---|---|---|\n| `POSTGRES_HOST` | — | Yes | PostgreSQL host |\n| `POSTGRES_PORT` | `5432` | No | PostgreSQL port |\n| `POSTGRES_DB` | — | Yes | Database name |\n| `POSTGRES_USER` | — | Yes | Database user |\n| `POSTGRES_PASSWORD` | — | Yes | Database password |\n| `POSTGRES_SSL_MODE` | — | No | SSL mode (e.g. `verify-ca`); omit for plain TCP |\n| `POSTGRES_SSL_ROOTCERT` | — | No | Path to server CA cert |\n| `POSTGRES_SSL_CERT` | — | No | Path to client cert |\n| `POSTGRES_SSL_KEY` | — | No | Path to client private key |\n\n## API\n\n| Method | Path | Description |\n|--------|------|-------------|\n| `GET` | `/health` | Liveness check — returns `{\"status\":\"ok\"}` |\n| `GET` | `/pricing/urls` | List all discovered pricing URLs; generates any missing schema files |\n| `POST` | `/pricing/load` | Load pricing data into PostgreSQL (blocks until complete) |\n| `GET` | `/versions` | List all loaded service versions |\n\n**GET /pricing/urls**\n\n```mermaid\nflowchart LR\n    U1[\"Fetch AWS service index\"] --\u003e U2[\"Fetch region indexes per service\"]\n    U2 --\u003e U3[\"Generate missing schema files\\n(schema/*.sql)\"]\n    U3 --\u003e U4[\"Return all pricing URLs\"]\n```\n\n**POST /pricing/load**\n\n```mermaid\nflowchart LR\n    L1[\"Check versions table\\n(skip already-loaded)\"] --\u003e L2[\"Union columns from all region CSVs\\nresolve name collisions\"]\n    L2 --\u003e L3[\"CREATE TABLE {service}_ingestion\\n(deduplicated columns)\"]\n    L3 --\u003e L4[\"COPY each region CSV → staging table\\n→ INSERT with COALESCE merge\\nON CONFLICT DO NOTHING → ingestion table\"]\n    L4 --\u003e L5[\"Swap ingestion → production table\"]\n    L5 --\u003e L6[\"Upsert version record\"]\n```\n\n`POST /pricing/load` accepts an optional JSON body to target a single service:\n\n```json\n{ \"name\": \"comprehend\" }\n```\n\nResponse:\n\n```json\n{ \"loaded\": 14, \"services\": 1, \"elapsed_seconds\": 42.3 }\n```\n\n## CLI\n\nThe CLI shares the same service layer as the API.\n\n### List pricing URLs\n\n```bash\npython fetch_pricing_index.py\npython fetch_pricing_index.py \u003e output.txt  # save to file\n```\n\nOutput (stdout) is a CSV with columns: `type, name, region, csv_url, publication_date`.\n\n### Load pricing data into PostgreSQL\n\n```bash\npython fetch_pricing_index.py --load\npython fetch_pricing_index.py --load --name comprehend\npython fetch_pricing_index.py --load --name AWSDatabaseSavingsPlans\n```\n\nAlready-loaded versions are skipped automatically (tracked in `aws_pricing_list_versions`).\n\n## Cloud Run Job\n\n`run_job.py` is designed to run as a [Google Cloud Run Job](https://cloud.google.com/run/docs/create-jobs) (one-time batch execution). It runs three steps in sequence, exiting with code 1 on any failure so the job runtime can detect and retry failures.\n\n### Steps\n\n1. **DB migrations** — applies any pending SQL migrations (same as API startup)\n2. **Version check** — queries current loaded versions, discovers how many service/region entries have new data available\n3. **Load** — streams and loads all new pricing CSVs into PostgreSQL\n\n### Usage\n\n```bash\n# Load all services with new versions\npython run_job.py\n\n# Load a single service (useful for testing)\npython run_job.py --name AWSComputeSavingsPlan\n\n# Force reload all services (ignore already-loaded versions)\npython run_job.py --force\n\n# Force reload a single service\npython run_job.py --name AWSComputeSavingsPlan --force\n```\n\n### Running as a Cloud Run Job\n\nThe same container image serves both the API and the job. The container entrypoint passes arguments through, so override the CMD via `--args`:\n\n```bash\n# Create the job\ngcloud run jobs create aws-pricing-loader \\\n  --image REGION-docker.pkg.dev/PROJECT/REPO/IMAGE \\\n  --args \"python,run_job.py\" \\\n  --set-env-vars \"POSTGRES_HOST=...,POSTGRES_DB=...,POSTGRES_USER=...,POSTGRES_PASSWORD=...\"\n\n# Execute the job\ngcloud run jobs execute aws-pricing-loader\n\n# Force reload all services (skip version check)\ngcloud run jobs update aws-pricing-loader \\\n  --args \"python,run_job.py,--force\"\ngcloud run jobs execute aws-pricing-loader\n\n# Target a single service\ngcloud run jobs update aws-pricing-loader \\\n  --args \"python,run_job.py,--name,comprehend\"\ngcloud run jobs execute aws-pricing-loader\n```\n\nCommas in `--args` delimit separate argv entries.\n\n## How loading works\n\nFor each service with new data:\n\n1. Fetches column headers from all region CSVs concurrently and unions them. Columns that normalise to the same snake_case name (e.g. `StorageType` and `Storage Type` → `storage_type`) are detected as collisions: the staging representation uses `_2`/`_3` suffixes to preserve CSV positions, while the ingestion table schema keeps only the base name. Generates and executes a `CREATE TABLE` DDL directly to the DB.\n2. Streams each region's CSV, strips the first 6 lines (metadata + header), and bulk-loads via `COPY … FROM STDIN` into a temporary `UNLOGGED` staging table (created with all staging columns including collision suffixes, all as `TEXT`). Rows are then merged into the ingestion table with `INSERT … SELECT … ON CONFLICT (rate_code, pricing_region) DO NOTHING`, using `COALESCE` to collapse suffix variants into a single column. Columns with non-TEXT types (e.g. `effective_date DATE`, `price_per_unit DECIMAL`) are cast via `NULLIF(…, '')::TYPE` to handle both empty strings and NULLs. Global items that appear identically in multiple region CSVs are silently deduplicated.\n3. Atomically swaps the ingestion table into production: renames the existing `{service}` table to `drop_{service}`, renames `{service}_ingestion` to `{service}`, then drops `drop_{service}`.\n4. Records the loaded version in `aws_pricing_list_versions` so subsequent runs skip it.\n\nTable names use the original AWS service name (e.g. `AmazonEC2`, `AWSDatabaseSavingsPlans`), not snake_case.\n\n## Schema files\n\nSchema files in `schema/` are generated during URL listing (both CLI listing mode and `GET /pricing/urls`). Each file (`{service}_ingestion.sql`) contains a `CREATE TABLE` + index DDL. In `--load` mode the DDL is generated on-the-fly and executed directly.\n\nTo force schema regeneration, delete the corresponding `.sql` file and re-run in listing mode.\n\n## Migrations\n\nDB migrations live in `migrations/` as numbered SQL files (`0001_*.sql`, `0002_*.sql`, …). They are applied automatically in filename order every time the API starts. Applied migrations are tracked in the `schema_migrations` table so each file runs exactly once.\n\nTo add a migration:\n\n```bash\n# Create the file\necho \"ALTER TABLE aws_pricing_list_versions ADD COLUMN IF NOT EXISTS notes TEXT;\" \\\n  \u003e migrations/0002_add_notes_column.sql\n\n# Deploy — runs on next startup\nuvicorn app.main:app\n# Log: Applied migration: 0002_add_notes_column.sql\n```\n\n## Testing\n\n### Unit and API tests\n\nNo database or network connection required:\n\n```bash\npip install -r requirements-dev.txt\npytest tests/\n```\n\nCovers:\n- `tests/test_aws_client.py` — `to_snake_case` with real AWS service names\n- `tests/test_schema_builder.py` — column type overrides, index name truncation, DDL generation, collision detection and merge_map\n- `tests/test_loader.py` — column union, collision deduplication, COALESCE INSERT, staging/ingestion schema split\n- `tests/test_api.py` — all endpoints via FastAPI `TestClient` with mocked service layer\n- `tests/test_migrations.py` — migration runner: ordering, skip-applied, connection cleanup\n- `tests/test_main.py` — lifespan calls `run_migrations()` on startup\n\n### Integration tests\n\nRequires a running PostgreSQL instance (`.env` configured):\n\n```bash\npytest test_create_table.py\n```\n\n### Manual smoke tests\n\nSmoke test a savings plan:\n\n```bash\npsql -c \"DELETE FROM aws_pricing_list_versions WHERE name = 'aws_database_savings_plans';\"\npython fetch_pricing_index.py --load --name AWSDatabaseSavingsPlans\n# Expected: [TABLE] created → [COPY] 36 regions → [SWAP] → [VERSION]\n```\n\nSmoke test a service:\n\n```bash\npsql -c \"DELETE FROM aws_pricing_list_versions WHERE name = 'comprehend';\"\npython fetch_pricing_index.py --load --name comprehend\n# Expected: [TABLE] created → [COPY] 14 regions → [SWAP] → [VERSION]\n```\n\n## References\n\nSee [references.md](references.md) for AWS Pricing API documentation links and JSON/CSV structure details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimfanda35%2Faws-pricing-list-loader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimfanda35%2Faws-pricing-list-loader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimfanda35%2Faws-pricing-list-loader/lists"}