{"id":29926770,"url":"https://github.com/cmpadden/dagster-databricks-components-demo","last_synced_at":"2025-08-02T12:43:58.008Z","repository":{"id":305951402,"uuid":"1019754218","full_name":"cmpadden/dagster-databricks-components-demo","owner":"cmpadden","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-22T21:14:54.000Z","size":163,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-22T21:24:23.631Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmpadden.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-14T20:21:48.000Z","updated_at":"2025-07-22T21:14:58.000Z","dependencies_parsed_at":"2025-07-22T21:35:33.840Z","dependency_job_id":null,"html_url":"https://github.com/cmpadden/dagster-databricks-components-demo","commit_stats":null,"previous_names":["cmpadden/dagster-databricks-components-demo"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/cmpadden/dagster-databricks-components-demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpadden%2Fdagster-databricks-components-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpadden%2Fdagster-databricks-components-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpadden%2Fdagster-databricks-components-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpadden%2Fdagster-databricks-components-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmpadden","download_url":"https://codeload.github.com/cmpadden/dagster-databricks-components-demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpadden%2Fdagster-databricks-components-demo/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268392230,"owners_count":24243298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-02T12:43:11.209Z","updated_at":"2025-08-02T12:43:57.963Z","avatar_url":"https://github.com/cmpadden.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cimg width=\"1400\" height=\"225\" alt=\"Frame 21\" src=\"https://github.com/user-attachments/assets/6c3e4596-b1ea-4196-bc36-965a34f3d3f1\" /\u003e\n\n# Dagster Databricks Components Demo\n\nThis project demonstrates how to use Dagster Components to interface with Databricks and create a unified view of your data platform. It showcases how components make it easy to orchestrate Databricks jobs while maintaining full visibility and lineage tracking within Dagster's single pane of glass.\n\n## Overview\n\nThe demo includes:\n\n- **Custom Databricks Job Component**: A reusable component that wraps Databricks jobs as Dagster assets\n- **Asset Specifications**: Declarative asset definitions with proper lineage and metadata\n- **Cross-Platform Integration**: Seamless connection between Dagster orchestration and Databricks execution\n- **Unified Monitoring**: View all your data assets and their dependencies in one place\n\n## Architecture\n\n```\n┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐\n│   Dagster UI    │    │   Databricks     │    │   Data Assets   │\n│                 │    │   Workspace      │    │                 │\n│ • Asset Lineage │◄──►│ • Job Execution  │◄──►│ • S3 Buckets    │\n│ • Monitoring    │    │ • Compute        │    │ • Tables        │\n│ • Scheduling    │    │ • Notebooks      │    │ • Reports       │\n└─────────────────┘    └──────────────────┘    └─────────────────┘\n```\n\n## Features\n\n- **Databricks Job Integration**: Execute Databricks jobs directly from Dagster with full parameter passing\n- **Asset Lineage**: Track data dependencies across your entire pipeline\n- **Metadata Enrichment**: Automatically capture job run information, timing, and parameters\n- **Environment Configuration**: Secure credential management using environment variables\n- **Declarative Components**: Define your data pipelines using YAML configuration\n\n## Project Structure\n\n```\ndagster-databricks-components-demo/\n├── src/\n│   └── dagster_databricks_components_demo/\n│       ├── components/\n│       │   └── databricks_job_component.py    # Custom Databricks component\n│       ├── defs/\n│       │   └── databricks_job/\n│       │       └── defs.yaml                  # Component configuration\n│       └── definitions.py                     # Main Dagster definitions\n├── pyproject.toml                             # Project dependencies\n└── README.md                                  # This file\n```\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.9-3.13.3\n- uv package manager\n- Databricks workspace access\n- Databricks job ID and credentials\n\n### Installation\n\n1. **Clone and navigate to the project:**\n   ```bash\n   cd dagster-databricks-components-demo\n   ```\n\n2. **Create and activate a virtual environment:**\n   ```bash\n   uv venv\n   source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n   ```\n\n3. **Install dependencies:**\n   ```bash\n   uv sync\n   ```\n\n### Configuration\n\n4. **Set up environment variables:**\n   ```bash\n   export DATABRICKS_HOST=\"https://your-workspace.cloud.databricks.com\"\n   export DATABRICKS_TOKEN=\"your-databricks-token\"\n   ```\n\n### Running the Demo\n\n5. **Start the Dagster development server:**\n   ```bash\n   dg dev\n   ```\n\n6. **Access the Dagster UI:**\n   Open your browser to `http://localhost:3000` to explore the asset lineage, trigger materializations, and monitor your Databricks jobs.\n\n## Component Configuration\n\nThe demo uses a YAML-based configuration in `src/dagster_databricks_components_demo/defs/databricks_job/defs.yaml`:\n\n```yaml\ntype: dagster_databricks_components_demo.components.databricks_job_component.DatabricksJobComponent\n\nattributes:\n  job_id: 1000180891217799  # Your Databricks job ID\n  job_parameters:\n    source_file_prefix: \"s3://acme-analytics/raw\"\n    destination_file_prefix: \"s3://acme-analytics/reports\"\n  \n  workspace_config:\n    host: \"{{ env.DATABRICKS_HOST }}\"\n    token: \"{{ env.DATABRICKS_TOKEN }}\"\n  \n  assets:\n    - key: account_performance\n      owners: [\"alice@acme.com\"]\n      deps: [prepared_accounts, prepared_customers]\n      kinds: [parquet]\n```\n\n## Key Benefits\n\n- **Unified Orchestration**: Manage both Dagster and Databricks workloads from a single interface\n- **Complete Lineage**: Track data flow from raw sources through Databricks transformations to final outputs\n- **Operational Excellence**: Monitor job health, performance, and data quality in one place\n- **Developer Experience**: Write infrastructure as code with type-safe, declarative components\n- **Scalability**: Leverage Databricks' compute power while maintaining Dagster's orchestration capabilities\n\n## Next Steps\n\n- Customize the `job_id` and `job_parameters` in the YAML configuration for your Databricks jobs\n- Add additional asset specifications to match your data pipeline\n- Explore scheduling and sensor capabilities for automated pipeline execution\n- Integrate with your existing CI/CD workflows\n\nFor more information about Dagster Components, visit the [official documentation](https://docs.dagster.io/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmpadden%2Fdagster-databricks-components-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmpadden%2Fdagster-databricks-components-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmpadden%2Fdagster-databricks-components-demo/lists"}