{"id":50782656,"url":"https://github.com/tmph2003/superset-sdk-python","last_synced_at":"2026-06-12T05:00:56.380Z","repository":{"id":271816082,"uuid":"913744317","full_name":"tmph2003/superset-sdk-python","owner":"tmph2003","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-25T01:30:00.000Z","size":1863,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-25T03:34:34.623Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tmph2003.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.rst","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-08T09:17:28.000Z","updated_at":"2026-04-25T01:30:05.000Z","dependencies_parsed_at":"2025-01-10T04:22:42.002Z","dependency_job_id":"dfc0b5b3-d42d-4e94-8c79-40891619f440","html_url":"https://github.com/tmph2003/superset-sdk-python","commit_stats":null,"previous_names":["tmph2003/superset-sdk-python"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/tmph2003/superset-sdk-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmph2003%2Fsuperset-sdk-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmph2003%2Fsuperset-sdk-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmph2003%2Fsuperset-sdk-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmph2003%2Fsuperset-sdk-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tmph2003","download_url":"https://codeload.github.com/tmph2003/superset-sdk-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmph2003%2Fsuperset-sdk-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34229624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-12T05:00:55.538Z","updated_at":"2026-06-12T05:00:56.372Z","avatar_url":"https://github.com/tmph2003.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Semantic Layer Tool\n\nA lightweight, heavily optimized Python CLI tool designed to synchronize models and metrics from a [dbt](https://www.getdbt.com/) project directly into an Apache Superset instance.\n\nThis project has been aggressively simplified to focus **exclusively** on dbt synchronization, ensuring maximum speed, minimal dependency footprint, and high maintainability.\n\n---\n\n## 🚀 Features\n\n- **Automated Database Setup:** Automatically provisions the `$target` database connection in Superset using your local `profiles.yml`.\n- **Dataset Synchronization:** Converts every dbt source and model matched by your `--select` criteria into an interactive Dataset within Superset.\n- **Metric Integration:** Extracts and attaches metrics (both standard metrics and Semantic Layer/MetricFlow configurations) directly to their corresponding Superset datasets.\n- **Metadata Propagation:** Synchronizes column descriptions, labels, and advanced metadata from your dbt models to Superset.\n- **Superset-specific Metadata:** Pass Superset-exclusive settings (e.g. `cache_timeout`) via the `meta` tag in your dbt YML files.\n\n---\n\n## 📦 Installation\n\nThis tool requires Python 3.8+.\n\n```bash\n# Clone the repository\ngit clone https://github.com/tmph2003/superset-sdk-python.git\ncd superset-sdk-python\n\n# Install the package locally\npip install -e .\n```\n\n---\n\n## 💻 Usage\n\nThe CLI acts as a bridge between your compiled dbt `manifest.json` and your Superset instance API.\n\n### Basic Sync Command\n\n```bash\nsuperset-cli https://superset-dev.sunhouse.com.vn/ target/manifest.json \\\n  --username admin \\\n  --password admin \\\n  --project=sunhouse_etl_pipeline \\\n  --profile=sunhouse_etl_pipeline \\\n  --target=dev \\\n  --profiles=profiles.yml \\\n  --select models/gold/ \\\n  --merge-metadata \\\n  --max-workers=3\n```\n\n### Advanced Metadata Customization\n\nThe tool allows you to pass Superset-exclusive settings via the `meta` tag in your dbt configurations. This works for both Database connections (`profiles.yml`) and Datasets (`models/*.yml`).\n\n#### 1. Customizing the Database Connection (`profiles.yml`)\nWhen using `--import-db`, the CLI reads your dbt target to create the Superset database. You can customize the name of the database or enable SQL Lab features by adding a `meta` block to your target in `profiles.yml`:\n\n```yaml\nsunhouse_etl_pipeline:\n  target: dev\n  outputs:\n    dev:\n      type: bigquery\n      method: oauth\n      project: my_gcp_project\n      dataset: my_dbt_dataset\n      # Pass Superset-specific overrides here\n      meta:\n        superset:\n          database_name: \"Sunhouse Data Warehouse\" # Overrides the default \"{project}_{target}\" name\n          cache_timeout: 86400                     # Database-level cache timeout\n          expose_in_sqllab: true                   # Enable SQL Lab access\n```\n\n#### 2. Customizing Datasets (`models/*.yml`)\nSimilarly, you can specify values for Superset-only fields directly in your dbt model definitions under the `model.meta.superset.{{field_name}}` key:\n\n```yaml\nmodels:\n  - name: my_dbt_model\n    meta:\n      superset:\n        cache_timeout: 250 # Sets the dataset cache timeout to 250 seconds in Superset.\n        filter_select_enabled: true\n```\n\n### Command Options\n\nRun `superset-cli --help` for a full list of configuration options:\n\n- `--jwt-token`: Authenticate via JWT token instead of username/password.\n- `--import-db`: Import (or update) the database connection to Superset automatically.\n- `--select` / `-s`: Select specific models or paths to sync (e.g. `models/gold/`).\n- `--exclude` / `-x`: Exclude specific models from syncing.\n- `--metrics` / `-m`: Select specific metrics to sync (comma-separated).\n- `--merge-metadata`: Update Superset configurations based on dbt metadata while preserving Superset-only metrics.\n- `--preserve-metadata`: Completely preserve existing column and metric configurations defined in Superset.\n- `--disallow-edits`: Mark resources as managed externally to prevent users from editing them in the Superset UI.\n- `--max-workers`: Control the number of parallel workers used for processing Semantic Layer metrics (default: 3).\n\n---\n\n## 🛠️ Development Guide\n\nIf you wish to contribute or customize this tool, understanding the highly flattened, modular architecture is critical.\n\n### Project Structure Overview\n\n```text\nsuperset-sdk-python/\n├── pyproject.toml      # Modern Python build system metadata \u0026 dependencies\n├── setup.cfg           # Core packaging configs and entry point definitions\n├── README.md           # Documentation\n├── tests/              # Pytest unit tests (run using `pytest -v`)\n└── src/                # The core source code\n```\n\n### Module Guide (`src/`)\n\nThe core codebase is divided into specialized layers to handle API interactions, authentication, and CLI orchestration.\n\n#### 1. CLI Execution Layer (`src/cli/`)\nThis is where the orchestration of the synchronization happens.\n- **`command.py`**: The main entry point. Defines the Click CLI arguments, sets up logging, parses inputs, and directs the overall sync flow.\n- **`databases.py`**: Handles importing or updating the main database connection inside Superset using credentials found in dbt's `profiles.yml`.\n- **`datasets.py`**: The heavy lifter for models. Syncs physical tables/views into Superset Datasets, applies columns, data types, descriptions, and metrics.\n- **`metrics.py`**: Extracts traditional dbt metric definitions, parses their aggregations, and structures them for Superset.\n- **`metricflow.py`**: Specifically handles advanced MetricFlow / dbt Semantic Layer definitions (e.g. derived metrics, ratios) using concurrent workers.\n- **`relations.py`**: Manages entity relationships, generating CTEs and complex join conditions based on Semantic Layer constraints.\n- **`lib.py`**: Shared CLI utility functions, such as profile loading and model filtering logic.\n\n#### 2. API Interaction Layer (`src/api/`)\nContains code for interpreting files and talking to external APIs.\n- **`clients/superset.py`**: A robust REST API client for Superset. Performs paginated GETs, POSTs, and PUTs to manipulate databases and datasets.\n- **`clients/dbt.py`**: Uses `marshmallow` schemas to safely and cleanly parse the massive dbt `manifest.json` files.\n- **`operators.py`**: Simple filtering operators (e.g. `Equal`, `OneToMany`) used to build queries for the Superset API.\n\n#### 3. Authentication Layer (`src/auth/`)\nHandles the nuances of logging in and maintaining sessions.\n- **`main.py`**: A generic base class that automatically intercepts `401 Unauthorized` responses and re-authenticates.\n- **`superset.py`**: Implements Superset-specific login flows, including Username/Password login (which fetches a CSRF token) and JWT Token login.\n- **`token.py`**: A lightweight implementation for simple Bearer token injection.\n\n#### 4. Shared Utilities (`src/`)\n- **`lib.py`**: Generic helpers like logging initialization and error payload validation (SIP-40 compliance).\n- **`exceptions.py`**: Custom exception classes (`CLIError`, `SupersetError`, `DatabaseNotFoundError`) for clean error handling.\n\n### Running Tests\n\nWe use `pytest` for unit testing. The test suite is fast and covers metric parsing, dataset sync logic, and API serialization.\n\n```bash\n# Run all tests with verbosity\npytest -v\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmph2003%2Fsuperset-sdk-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftmph2003%2Fsuperset-sdk-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmph2003%2Fsuperset-sdk-python/lists"}