{"id":48223918,"url":"https://github.com/ssp-data/cloud-cost-analyzer","last_synced_at":"2026-04-04T19:16:37.917Z","repository":{"id":324050770,"uuid":"1094904312","full_name":"ssp-data/cloud-cost-analyzer","owner":"ssp-data","description":"FinOps repository: An open-source framework for multi-cloud cost visibility. Extendable with dlt.","archived":false,"fork":false,"pushed_at":"2026-02-19T15:51:19.000Z","size":5996,"stargazers_count":21,"open_issues_count":0,"forks_count":6,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-19T19:28:45.170Z","etag":null,"topics":["aws-cost-explorer","data-engineering","data-engineering-project","data-pipeline","dlt","gcp-cost-report","stripe"],"latest_commit_sha":null,"homepage":"http://ssp.sh/blog/cost-analyzer-aws-gcp","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssp-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-12T10:27:41.000Z","updated_at":"2026-02-19T15:51:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ssp-data/cloud-cost-analyzer","commit_stats":null,"previous_names":["ssp-data/cloud-cost-analyzer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ssp-data/cloud-cost-analyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssp-data%2Fcloud-cost-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssp-data%2Fcloud-cost-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssp-data%2Fcloud-cost-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssp-data%2Fcloud-cost-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssp-data","download_url":"https://codeload.github.com/ssp-data/cloud-cost-analyzer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssp-data%2Fcloud-cost-analyzer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31409858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-cost-explorer","data-engineering","data-engineering-project","data-pipeline","dlt","gcp-cost-report","stripe"],"created_at":"2026-04-04T19:16:37.848Z","updated_at":"2026-04-04T19:16:37.901Z","avatar_url":"https://github.com/ssp-data.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cloud Cost Analyzer Project\n\nMulti-cloud cost analytics platform combining AWS Cost and Usage Reports (CUR), GCP billing data, and Stripe revenue metrics. Built with dlt for data ingestion, DuckDB/ClickHouse for storage, and Rill for visualization.\n\n\u003e **NEW: Cloud-Ready Version**\n\u003e\n\u003e This version now supports **both local and cloud deployment**:\n\u003e - **Local Mode**: Parquet files + DuckDB + local Rill (perfect for development)\n\u003e - **Cloud Mode**: ClickHouse Cloud + Rill Cloud + GitHub Actions automation (production-ready)\n\u003e\n\u003e Switch between modes with a single command. The same codebase works everywhere! Looking for the original local-only version? Check out [branch `v1`](https://github.com/ssp-data/cloud-cost-analyzer/tree/v1)\n\n\u003e\n\u003e **Note**: The live cloud demo is currently offline to save costs. You can recreate the full cloud setup with your own [ClickHouse Cloud](https://clickhouse.cloud) and [Rill Cloud](https://www.rilldata.com/try-free) instances following the [blog article](https://www.ssp.sh/blog/finops-dlt-clickhouse-rill/).\n\n![](img/tech-stack.png)\n\n\n## Features\n\n- **Multi-Cloud Cost Tracking** - AWS, GCP, and future cloud providers\n- **Revenue Integration** - Stripe payment data for margin analysis\n- **Dual Deployment Modes** - Run locally with Parquet/DuckDB or in the cloud with ClickHouse/Rill Cloud\n- **Incremental Loading** - Efficient append-only data pipeline with dlt\n- **Advanced Analytics** - RI/SP utilization, unit economics, effective cost tracking (adapted from [aws-cur-wizard](https://github.com/Twing-Data/aws-cur-wizard))\n- **GitHub Actions Automation** - [Daily data updates](.github/workflows/etl-pipeline.yml) with automated ETL pipelines\n- **Data Anonymization** - Built-in anonymization for public dashboards (see [ANONYMIZATION.md](ANONYMIZATION.md))\n- **Dynamic Dashboards** - Powered by Rill visualizations\n\n## Quick Start with Demo Data\n\nTry without any credentials:\n```bash\ngit clone https://github.com/ssp-data/cloud-cost-analyzer.git\ncd cloud-cost-analyzer\nmake demo\n```\n\nOpens at http://localhost:9009 with sample data.\n\n- **Note 1**: Rill tries to be installed during that process. But best to have it install first with: `curl https://rill.sh | sh`\n- **Note 2**: Run `make clear` before `make run-all` to switch to real data.\n\n## How it works\n\n### Two Deployment Modes\n\nThis project supports both local development and cloud production:\n\n**Local Mode** (default):\n```sh\nmake run-all        # ETL → Parquet files → Local Rill dashboard\nmake serve          # View dashboards at localhost:9009\n```\nPerfect for: Development, testing, small datasets\n\n**Cloud Mode** (production-ready):\n```sh\nmake run-all-cloud  # ETL → ClickHouse Cloud → Anonymize → Rill Cloud/Local\n```\nPerfect for: Production, team collaboration, large datasets, public dashboards\n\n\u003e **Note**: The ClickHouse cluster is currently offline to save costs. The cloud setup is fully documented and functional—you'll just need your own ClickHouse Cloud instance.\n\n**Additional commands:**\n```sh\nmake aws-dashboards  # Generate dynamic AWS dashboards (local mode only)\nmake serve-clickhouse  # View dashboards connected to ClickHouse\n```\n\n### Deployment Guides\n\n- **[CLICKHOUSE.md](CLICKHOUSE.md)** - Complete ClickHouse Cloud setup, deployment, mode switching, and GitHub Actions automation\n- **[ANONYMIZATION.md](ANONYMIZATION.md)** - Data anonymization for public dashboards\n- **[CLAUDE.md](CLAUDE.md)** - Architecture details and development guide\n\n\n## Setup\nWe need to get the Cost reports and the credentials properly setup.\n\nFirst we clone the project, isntall dependencies, then we setup the cost reports for each cloud provider and at the end, we need to set `.dlt/secrets.toml` and `.dlt/config.toml` to match your data. \n\nBelow step by step how to create a cost report and extract the keys. Also check related blog posts: [local first](http://www.ssp.sh/blog/cost-analyzer-aws-gcp/) and [cloud-ready](https://www.ssp.sh/blog/finops-dlt-clickhouse-rill/) for more details.\n\n\n### 1. Install Dependencies\n\n```bash\ngit clone https://github.com/ssp-data/cloud-cost-analyzer.git\ncd cloud-cost-analyzer\nuv sync  # Installs all packages from pyproject.toml\n```\n\n\n### 2. Configure Data Sources\n\nYou need to set up cost/revenue exports from each cloud provider:\n\n#### AWS Cost and Usage Report (CUR)\n\n**One-time setup in AWS Console**:\n\n1. Go to [AWS Billing Console](https://us-east-1.console.aws.amazon.com/billing/home?region=us-east-1#/bills) → **Cost \u0026 Usage Reports**\n2. Click **\"Create report\"**\n3. Configure:\n   - Report name: `CUR-export-test` (or your choice)\n   - Time granularity: **Hourly** or **Daily**\n   - Enable: **Include resource IDs**\n   - Report data integration: Select **Amazon Athena** (enables Parquet format)\n   - S3 bucket: Choose or create a bucket (e.g., `s3://your-bucket/cur`)\n   - Enable: **Overwrite existing report**\n\nAWS will automatically generate and upload CUR files to your S3 bucket daily.\n\n**Set AWS credentials**:\n\n```bash\n# Option 1: Environment variables (recommended)\nexport AWS_ACCESS_KEY_ID=\"your-key\"\nexport AWS_SECRET_ACCESS_KEY=\"your-secret\"\n\n# Option 2: Edit .dlt/secrets.toml\n[sources.filesystem.credentials]\naws_access_key_id = \"your-key\"\naws_secret_access_key = \"your-secret\"\n```\n\nThe `.env` file automatically sources these for dlt.\n\n#### Google Cloud Platform (BigQuery Export)\n\n**One-time setup in GCP Console**:\n\n1. Go to [GCP Billing Console](https://console.cloud.google.com/billing)\n2. Navigate to: **Billing → Billing export**\n3. Click **\"Edit settings\"** for **Detailed usage cost**\n4. Choose:\n   - BigQuery dataset: Create or select dataset (e.g., `billing_export`)\n5. Click **\"Save\"**\n\nGCP will automatically export billing data to BigQuery daily (usually completes by end of next day).\n\n**More**: [GCP Billing Export Guide](https://cloud.google.com/billing/docs/how-to/export-data-bigquery)\n\n**Create Service Account \u0026 Get Credentials**:\n\n1. Go to [IAM \u0026 Admin → Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)\n2. Click **\"+ CREATE SERVICE ACCOUNT\"**\n3. Grant roles:\n   - **BigQuery Data Viewer**\n   - **BigQuery Job User**\n4. Create JSON key: **Keys → ADD KEY → Create new key → JSON**\n5. Download the JSON file\n\n**Configure credentials in `.dlt/secrets.toml`**:\n\n```toml\n[source.bigquery.credentials]\nproject_id = \"your-project-id\"\nprivate_key = \"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\"\nclient_email = \"your-service-account@project.iam.gserviceaccount.com\"\ntoken_uri = \"https://oauth2.googleapis.com/token\"\n```\n\n**Note**: Extract these values from your downloaded JSON key file.\n\n#### Stripe Revenue Data\n\n**Get API Key**:\n\n1. Go to [Stripe Dashboard](https://dashboard.stripe.com/)\n2. Navigate to: **Developers → API keys**\n3. Copy your **Secret key** (starts with `sk_live_` or `sk_test_`)\n\n**Configure in `.dlt/secrets.toml`**:\n\n```toml\n[sources.stripe_analytics]\nstripe_secret_key = \"sk_live_your_key_here\"\n```\n\n### 3. Update Pipeline Configuration\n\nAll pipeline configuration is centralized in `.dlt/config.toml`. Edit this file to point to your data sources:\n\n**Edit `.dlt/config.toml`**:\n\n```toml\n# Pipeline configuration\n[pipeline]\npipeline_name = \"cloud_cost_analytics\"  # Change if needed\n\n# AWS CUR configuration\n[sources.aws_cur]\nbucket_url = \"s3://your-bucket-name\"  # Your S3 bucket\nfile_glob = \"cur/your-report-name/data/**/*.parquet\"  # Path to your CUR files\ntable_name = \"your_table_name\"  # Name for the output table\ndataset_name = \"aws_costs\"  # Dataset name (default: aws_costs)\ninitial_start_date = \"2025-09-01\"  # Only load data from this date onwards (filters by file modification date)\n\n# GCP BigQuery billing export configuration\n[sources.gcp_billing]\n# project_id is automatically read from secrets.toml\n# (uses source.bigquery.credentials.project_id)\n# Uncomment below only if you want to override:\n# project_id = \"your-gcp-project-id\"\ndataset = \"billing_export\"  # BigQuery dataset name\ndataset_name = \"gcp_costs\"  # Output dataset name (default: gcp_costs)\ninitial_start_date = \"2025-09-01T00:00:00Z\"  # Only load data from this date onwards (filters by export_time)\n# Update these table names to match your GCP billing export tables\n# Find them in BigQuery Console under your billing_export dataset\ntable_names = [\n    \"gcp_billing_export_resource_v1_XXXXXX_XXXXXX_XXXXXX\",\n    \"gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX\"\n]\n\n# Stripe configuration\n[sources.stripe]\ndataset_name = \"stripe_costs\"  # Dataset name (default: stripe_costs)\ninitial_start_date = \"2025-09-01\"  # Only load data from this date onwards (filters by created timestamp)\n```\n\n**Understanding `initial_start_date` Configuration:**\n\nThe `initial_start_date` parameter controls how far back to load historical data when running the pipeline for the first time. This is especially important when copying this project to avoid loading 10+ years of historical data:\n\n- **AWS**: Filters files by modification date. Format: `\"YYYY-MM-DD\"` (e.g., `\"2025-09-01\"`)\n- **GCP**: Filters records by `export_time` field. Format: `\"YYYY-MM-DDTHH:MM:SSZ\"` (e.g., `\"2025-09-01T00:00:00Z\"`)\n- **Stripe**: Filters transactions by created timestamp. Format: `\"YYYY-MM-DD\"` (e.g., `\"2025-09-01\"`)\n\n**Important Notes:**\n- Once data is loaded, subsequent runs only load new data (incremental loading)\n- To reset and reload from a different start date, run `make dlt-clear` to clear the dlt state\n- If omitted, AWS/Stripe will load all available data, and GCP will default to loading from 2000-01-01\n- Recommended: Set to a recent date (e.g., 3-6 months ago) to keep initial data load manageable\n\n**How to find your GCP billing table names:**\n1. Go to [BigQuery Console](https://console.cloud.google.com/bigquery)\n2. Find your billing export dataset (usually `billing_export`)\n3. Look for tables starting with `gcp_billing_export_v1_` or `gcp_billing_export_resource_v1_`\n4. Copy the full table names into the config above\n\n**Note about AWS table_name and Rill dashboards:**\nIf you change the AWS `table_name` from the default `cur_export_test_00001`, you'll also need to update the parquet path in `viz_rill/models/aws_costs.sql` (file has comments showing where).\n\n#### Cloud Deployment with ClickHouse \u0026 Rill Cloud\n\n**New in this version**: Deploy to production with ClickHouse Cloud and automate with GitHub Actions!\n\n**Quick Cloud Setup:**\n\n1. **Create ClickHouse Cloud service** ([sign up free](https://clickhouse.cloud))\n\n2. **Add credentials to `.dlt/secrets.toml`:**\n   ```toml\n   [destination.clickhouse.credentials]\n   host = \"xxxxx.europe-west4.gcp.clickhouse.cloud\"\n   port = 8443\n   username = \"default\"\n   password = \"your-password\"\n   secure = 1\n   ```\n\n3. **Configure Rill for ClickHouse** in `viz_rill/.env`:\n   ```bash\n   RILL_CONNECTOR=\"clickhouse\"\n   CONNECTOR_CLICKHOUSE_DSN=\"clickhouse://default:password@host:8443/default?secure=true\"\n   ```\n\n4. **Run cloud pipeline:**\n   ```bash\n   make init-clickhouse      # Initialize database (one-time)\n   make run-all-cloud        # ETL + anonymize + serve\n   ```\n\n**What you get:**\n- Data stored in ClickHouse Cloud (scalable, fast)\n- [Automated daily updates via GitHub Actions](.github/workflows/etl-pipeline.yml) (runs at 2 AM UTC)\n- Optional data anonymization for public dashboards\n- Works with both Rill Cloud and local Rill\n\n**Complete guide**: See [CLICKHOUSE.md](CLICKHOUSE.md) for detailed setup, GitHub Actions configuration, mode switching, and troubleshooting. \n\n### 4. Run the Pipeline\n\n```bash\nmake run-etl  # Loads AWS + GCP + Stripe data\nmake serve    # Opens Rill dashboards\n```\n\n\n## How the Data Pipeline Works\n\nThis pipeline supports both **local mode** (Parquet + DuckDB) and **cloud mode** (ClickHouse Cloud). The architecture remains the same, only the storage layer changes.\n\n## The Data Flow\n```mermaid\ngraph TB\n\nsubgraph \"1: EXTRACT (dlt)\"\n    A1[AWS S3\u003cbr/\u003eCUR Parquet]\n    A2[GCP BigQuery\u003cbr/\u003eBilling Export]\n    A3[Stripe API\u003cbr/\u003eRevenue]\n\n    P1[aws_pipeline.py\u003cbr/\u003e📥 Incremental]\n    P2[google_bq_pipeline.py\u003cbr/\u003e📥 Incremental]\n    P3[stripe_pipeline.py\u003cbr/\u003e📥 Incremental]\n\n    A1 --\u003e P1\n    A2 --\u003e P2\n    A3 --\u003e P3\nend\n\nsubgraph \"2: NORMALIZE (Python + DuckDB)\"\n    N1[\"normalize.py\u003cbr/\u003e🔧 Flatten MAP columns\u003cbr/\u003e(CUR 2.0 is flat already)\"]\n    N2[normalize_gcp.py\u003cbr/\u003e🔧 Flatten nested data]\n\n    P1 --\u003e N1\n    P2 --\u003e N2\n    P3 --\u003e R1\nend\n\nsubgraph \"3: STORAGE (Dual Mode)\"\n    R1[LOCAL: Parquet files\u003cbr/\u003eor\u003cbr/\u003eCLOUD: ClickHouse tables]\n    R2[Switch with DLT_DESTINATION env var]\n\n    N1 -.-\u003e R1\n    N2 --\u003e R1\n    P1 --\u003e R1\nend\n\nsubgraph \"4: MODEL (SQL - Star Schema)\"\n    M1[aws_costs.sql\u003cbr/\u003e🔷 Dimensions + Facts]\n    M2[gcp_costs.sql\u003cbr/\u003e🔷 Dimensions + Facts]\n    M3[stripe_revenue.sql\u003cbr/\u003e🔷 Dimensions + Facts]\n    M4[unified_cost_model.sql\u003cbr/\u003e🌟 UNION ALL + Currency Conversion]\n\n    R1 --\u003e M1\n    R1 --\u003e M2\n    R1 --\u003e M3\n\n    M1 --\u003e M4\n    M2 --\u003e M4\n    M3 --\u003e M4\nend\n\nsubgraph \"5: METRICS \u0026 DASHBOARDS (Rill)\"\n    MV1[aws_cost_metrics.yaml\u003cbr/\u003e📊 KPIs \u0026 Measures]\n    MV2[gcp_cost_metrics.yaml\u003cbr/\u003e📊 KPIs \u0026 Measures]\n    MV3[cloud_cost_metrics.yaml\u003cbr/\u003e📊 Unified Metrics]\n\n    D1[🎨 AWS Dashboard]\n    D2[🎨 GCP Dashboard]\n    D3[🎨 Cloud Cost Explorer\u003cbr/\u003eMulti-Cloud + Revenue]\n\n    M4 --\u003e MV1\n    M4 --\u003e MV2\n    M4 --\u003e MV3\n\n    MV1 --\u003e D1\n    MV2 --\u003e D2\n    MV3 --\u003e D3\nend\n\nstyle P1 fill:#4A90E2,stroke:#2E5C8A,color:#fff\nstyle P2 fill:#4A90E2,stroke:#2E5C8A,color:#fff\nstyle P3 fill:#4A90E2,stroke:#2E5C8A,color:#fff\nstyle N1 fill:#9B59B6,stroke:#7D3C98,color:#fff\nstyle N2 fill:#9B59B6,stroke:#7D3C98,color:#fff\nstyle R1 fill:#FF6B6B,stroke:#C92A2A,color:#fff\nstyle R2 fill:#FF6B6B,stroke:#C92A2A,color:#fff\nstyle M4 fill:#E74C3C,stroke:#C0392B,color:#fff\nstyle MV3 fill:#27AE60,stroke:#1E8449,color:#fff\nstyle D3 fill:#F39C12,stroke:#D68910,color:#fff\n\n```\n\n### Key Notes on Data Flow\n\n**AWS CUR 2.0 Format**: Modern AWS Cost and Usage Reports export in Parquet format with already-flattened columns. The `normalize.py` script exists for backward compatibility with older CUR formats that contained nested MAP columns (like resource tags), but for CUR 2.0, it acts as a pass-through operation—no transformation occurs.\n\n**GCP Billing Export**: Google Cloud exports use nested structures (e.g., `service__description`, `location__country`) that require flattening via `normalize_gcp.py` to make them accessible for analytics.\n\n**Stripe**: Revenue data comes pre-normalized from the Stripe API and requires no additional processing.\n\n### Incremental Loading\n\nUses `write_disposition=\"append\"` - cost data is append-only (no updates/merges needed). AWS uses `merge` for hard deduplication.\n\n### Data Flow by Mode\n\n**Local Mode:**\n```\nCloud Providers    dlt Pipelines          Storage              Visualization\nAWS S3 (CUR)   →   aws_pipeline.py   →   Parquet files    →   Rill (DuckDB)\nGCP BigQuery   →   google_bq_*.py    →   viz_rill/data/   →   localhost:9009\nStripe API     →   stripe_pipeline.py →                   →\n```\n\n**Cloud Mode:**\n```\nCloud Providers    dlt Pipelines          Storage               Visualization\nAWS S3 (CUR)   →   aws_pipeline.py   →   ClickHouse Cloud  →   Rill Cloud/Local\nGCP BigQuery   →   google_bq_*.py    →   (via dlt)         →   (connects to CH)\nStripe API     →   stripe_pipeline.py →                    →\n                        ↓\n                  GitHub Actions (automated daily at 2 AM UTC)\n```\nSee [workflow configuration](.github/workflows/etl-pipeline.yml) for details.\n\n### Output\n\n**Local Mode:**\n- **Parquet files**: `viz_rill/data/` (primary storage, used by Rill via DuckDB)\n- **DuckDB**: `cloud_cost_analytics.duckdb` (legacy, optional)\n\n**Cloud Mode:**\n- **ClickHouse tables**: Scalable cloud database\n- **Automated updates**: Via [GitHub Actions](.github/workflows/etl-pipeline.yml) (daily at 2 AM UTC)\n\n## Troubleshooting\n\n### Configuration Issues\n- All configuration is in `.dlt/config.toml` - check this file first\n- Verify your table names, project IDs, and bucket paths match your cloud provider setup\n- The test runner will use your config values automatically\n\n### AWS: \"No files found\"\n- Check S3 bucket path in `.dlt/config.toml` under `[sources.aws_cur]`\n- Verify AWS credentials: `aws s3 ls s3://your-bucket/`\n- Wait 24 hours after enabling CUR export (first files take time)\n\n### GCP: \"Table not found\"\n- Verify BigQuery table names in `.dlt/config.toml` under `[sources.gcp_billing]`\n- Check service account permissions (BigQuery Data Viewer + Job User)\n- Confirm billing export is enabled and dataset exists\n\n### Stripe: \"Invalid API key\"\n- Verify secret key in `.dlt/secrets.toml` starts with `sk_live_` or `sk_test_`\n- Check key has read permissions in Stripe Dashboard\n\n### Rill: \"No data in dashboards\"\n- Run `make run-etl` first to load data\n- Check parquet files exist: `ls viz_rill/data/*/`\n- Verify data loaded: `duckdb cloud_cost_analytics.duckdb -c \"SELECT COUNT(*) FROM aws_costs.cur_export_test_00001;\"`\n\n## Visualization with Rill\n\nThe `viz_rill/` directory contains Rill dashboards for multi-cloud cost analysis.\n\n```bash\nmake serve  # Opens Rill at http://localhost:9009\n```\n\n**Features**:\n- AWS cost analytics with RI/SP utilization tracking\n- Multi-cloud overview (AWS + GCP + Stripe)\n- Interactive explorers and product dimension analysis\n- Optional: Dynamic dashboard generation using [aws-cur-wizard](https://github.com/Twing-Data/aws-cur-wizard)\n\nSee `viz_rill/README.md` for dashboard details and integration information.\n\n## Data Normalization (Optional)\n\n**TL;DR: Normalization is optional and only needed for dynamic dashboard generation.**\n\n### What Normalization Does\n\nThe normalization scripts (`normalize.py`, `normalize_gcp.py`) flatten nested data structures (AWS resource_tags, GCP labels) and feed them to the dashboard generator from [aws-cur-wizard](https://github.com/Twing-Data/aws-cur-wizard).\n\n### Do You Need It?\n\nIt works without also. The core dashboards work without normalization:\n- Static dashboards (`viz_rill/dashboards/*.yaml`) query raw data via models\n- Models use `SELECT *` to read all columns from raw parquet/ClickHouse\n- Everything works for both local and cloud deployment\n\nBut it provides useful dashboards (alredy pre commited in this repo), but if you have different data, i'd suggest to run it. Normalization provides:\n- Auto-generated dimension-specific canvases (e.g., per-tag breakdowns)\n- Dynamic explorers based on discovered labels/tags\n- CUR Wizard's intelligent chart type selection\n\n### How to Generate Dynamic Dashboards\n\n**Important:** This only works in **local mode** (parquet files). ClickHouse mode doesn't create parquet files, so dynamic dashboard generation is unavailable.\n\n```bash\n# Local mode only - Generate AWS canvases (optional)\nmake aws-dashboards  # Normalizes + generates canvases/explores/\n\n# Local mode only - Generate GCP canvases (optional)\nmake gcp-dashboards  # Normalizes + generates canvases/explores/\n\n# View all dashboards (static + generated)\nmake serve\n```\n\n**Generated files** (in `.gitignore` but can be committed):\n- `viz_rill/canvases/*.yaml` - Dimension-specific breakdowns\n- `viz_rill/explores/*.yaml` - Auto-generated explorers\n- `viz_rill/data/normalized_aws.parquet` - Flattened AWS data\n- `viz_rill/data/normalized_gcp.parquet` - Flattened GCP data\n\n### Feature Comparison: Local vs Cloud Mode\n\n| Feature | Local Mode (Parquet) | Cloud Mode (ClickHouse) |\n|---------|---------------------|------------------------|\n| Static dashboards | ✅ Always work | ✅ Always work |\n| Dynamic dashboard generation | ✅ `make aws-dashboards` | ❌ Not available* |\n| Data storage | Parquet files | ClickHouse Cloud |\n| Normalization | Optional | Not needed |\n| Best for | Development, dynamic canvases | Production, GitHub Actions |\n\n*ClickHouse doesn't create parquet files, so the dashboard generator can't analyze data. Static dashboards provide full functionality.\n\nSee [CLICKHOUSE.md](CLICKHOUSE.md#advanced-normalization-optional) for more details.\n\n## Complete Workflow\n\n### Local Development Workflow\n\n```bash\n# Full local workflow: ETL + dashboards\nmake run-all\n\n# Or step-by-step:\nmake run-etl         # 1. Load AWS/GCP/Stripe data → Parquet files\nmake aws-dashboards  # 2. (Optional) Generate dynamic dashboards\nmake serve           # 3. View in browser at localhost:9009\n```\n\n### Cloud Production Workflow\n\n```bash\n# Full cloud workflow: ETL + anonymize + serve\nmake run-all-cloud\n\n# Or step-by-step:\nmake init-clickhouse      # 1. Initialize ClickHouse (one-time)\nmake run-etl-clickhouse   # 2. Load data → ClickHouse Cloud\nmake anonymize-clickhouse # 3. (Optional) Anonymize for public dashboards\nmake serve-clickhouse     # 4. View dashboards (local Rill → ClickHouse)\n```\n\n**Switching modes:**\n```bash\nmake set-connector-duckdb      # Switch to local Parquet/DuckDB\nmake set-connector-clickhouse  # Switch to ClickHouse Cloud\n```\n\n## Documentation\n\n### Cloud Deployment\n- **[CLICKHOUSE.md](CLICKHOUSE.md)** - ClickHouse Cloud setup, GitHub Actions automation, mode switching, and troubleshooting\n- **[ANONYMIZATION.md](ANONYMIZATION.md)** - Data anonymization for public dashboards\n- **GitHub Actions Workflows**:\n  - [Daily ETL Pipeline](.github/workflows/etl-pipeline.yml) - Automated data ingestion (runs at 2 AM UTC)\n  - [Clear ClickHouse Data](.github/workflows/clear-clickhouse.yml) - Manual workflow to drop all tables\n\n### Development \u0026 Architecture\n- **[CLAUDE.md](CLAUDE.md)** - Architecture details, key design patterns, and development guide\n- **[viz_rill/README.md](viz_rill/README.md)** - Dashboard structure and visualization layer\n- **[ATTRIBUTION.md](ATTRIBUTION.md)** - Third-party components (aws-cur-wizard)\n\n### Related Resources\n- **Blog Post**: [Multi-Cloud Cost Analytics: From Cost-Export to Parquet to Rill](https://www.ssp.sh/blog/cost-analyzer-aws-gcp/) - Detailed write-up of this project\n- **Blog Post Part 2**: [Multi-Cloud Cost Analytics with dlt, ClickHouse \u0026 Rill](https://www.ssp.sh/blog/finops-dlt-clickhouse-rill/) - Detailed write-up of this project\n- **Original Local Version**: [Branch `v1`](https://github.com/ssp-data/cloud-cost-analyzer/tree/v1) - Pre-ClickHouse version\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssp-data%2Fcloud-cost-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssp-data%2Fcloud-cost-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssp-data%2Fcloud-cost-analyzer/lists"}