{"id":38702704,"url":"https://github.com/arookieds/dagster-deployment","last_synced_at":"2026-01-17T10:51:39.933Z","repository":{"id":328673745,"uuid":"1115763245","full_name":"arookieds/dagster-deployment","owner":"arookieds","description":"Dagster instance deployment to kubernetes","archived":false,"fork":false,"pushed_at":"2026-01-10T21:28:49.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-11T06:34:53.820Z","etag":null,"topics":["dagster","deployment","helm","helm-charts","kubectl","kubernetes","kubeseal","sealed-secrets","sops"],"latest_commit_sha":null,"homepage":"","language":"Nushell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arookieds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-13T14:00:36.000Z","updated_at":"2026-01-10T21:28:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/arookieds/dagster-deployment","commit_stats":null,"previous_names":["arookieds/dagster-deployment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arookieds/dagster-deployment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arookieds%2Fdagster-deployment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arookieds%2Fdagster-deployment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arookieds%2Fdagster-deployment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arookieds%2Fdagster-deployment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arookieds","download_url":"https://codeload.github.com/arookieds/dagster-deployment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arookieds%2Fdagster-deployment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28506593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T10:25:30.148Z","status":"ssl_error","status_checked_at":"2026-01-17T10:25:29.718Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dagster","deployment","helm","helm-charts","kubectl","kubernetes","kubeseal","sealed-secrets","sops"],"created_at":"2026-01-17T10:51:38.131Z","updated_at":"2026-01-17T10:51:39.911Z","avatar_url":"https://github.com/arookieds.png","language":"Nushell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dagster Deployment for Kubernetes\n\nProduction-ready Dagster deployment using Helm + Kustomize for Kubernetes clusters. This repository provides a complete deployment setup for running Dagster as an orchestration platform for data pipelines.\n\n**Live Example:** Deployed at `dagster.homelab.lan` | **Status:** ✅ Running | **Version:** Dagster 1.12.6\n\n---\n\n## 🎯 Features\n\n- **GitOps-Ready**: Kustomize-based deployment with Helm chart integration\n- **Secure by Default**: Sealed Secrets for credential management\n- **Production Architecture**: Separation of Dagster instance and code locations\n- **Scalable Design**: Supports multiple code locations and horizontal scaling\n- **Battle-Tested**: Includes real-world troubleshooting guides and operational procedures\n\n---\n\n## 🏗️ Architecture\n\n### Data Pipeline Flow\n\n```mermaid\nflowchart LR\n    subgraph \"Public Internet\"\n        API1(Binance)\n        API2(ByBit)\n        API3(Gate.io)\n    end\n    subgraph \"Private Network\"\n        subgraph Kubernetes\n            subgraph Dagster\n                E(Extract Job)\n                T(Transform Job)\n            end\n            PS(PostgreSQL)\n            SS(SuperSet)\n            DS(Dashboard)\n        end\n        subgraph LXC\n            M(MinIO)\n        end\n    end\n    \n    API1 --\u003e E --\u003e M\n    API2 --\u003e E --\u003e M\n    API3 --\u003e E --\u003e M\n    M --\u003e T --\u003e PS --\u003e SS --\u003e DS\n```\n\n### Internal Communication\n\n```mermaid\nflowchart TD\n    subgraph Configuration\n        HelmValues[\"Helm Values / Kustomization\u003cbr/\u003e(Defines code locations)\"]\n    end\n\n    subgraph K8s_Dagster[\"Kubernetes Namespace: dagster\"]\n        direction TB\n        subgraph Control_Plane[\"Control Plane\"]\n            style D_Web fill:#e1f5fe,stroke:#01579b\n            style D_Daemon fill:#e1f5fe,stroke:#01579b\n            \n            D_Web[\"Dagster Webserver\u003cbr/\u003e(UI \u0026 API)\"]\n            D_Daemon[\"Dagster Daemon\u003cbr/\u003e(Scheduler)\"]\n        end\n\n        subgraph Code_Exec[\"Code Execution\"]\n            style U_Code fill:#f3e5f5,stroke:#4a148c\n            U_Code[\"User Code Pod\u003cbr/\u003e(gRPC: 3030)\"]\n            Py_Defs[\"Python Code\u003cbr/\u003e(Assets/Jobs)\"]\n        end\n        \n        Service[\"Code Location Service\u003cbr/\u003eClusterIP: 3030\"]\n        \n        HelmValues -.-\u003e|Configures| D_Web\n        HelmValues -.-\u003e|Configures| D_Daemon\n        \n        D_Web -- \"gRPC\" --\u003e Service\n        D_Daemon -- \"gRPC\" --\u003e Service\n        \n        Service --\u003e U_Code\n        U_Code --\u003e Py_Defs\n    end\n    \n    subgraph Database_NS[\"Database Namespace\"]\n        DB[(\"PostgreSQL\")]\n    end\n\n    D_Daemon -- \"Run State\" --\u003e DB\n    D_Web -- \"Run History\" --\u003e DB\n```\n\n**Key Design Decisions:**\n- **Stateless Dagster Instance**: No persistent volumes required\n- **Separate Code Locations**: Jobs run in isolated pods from control plane\n- **External Dependencies**: PostgreSQL for metadata, MinIO for raw data storage\n- **gRPC Communication**: Webserver/Daemon communicate with code locations via gRPC (port 3030)\n\n---\n\n## 📋 Prerequisites\n\n### Infrastructure Requirements\n\n| Component | Version | Purpose |\n|-----------|---------|---------|\n| Kubernetes | 1.24+ | Container orchestration |\n| PostgreSQL | 17.6.0+ | Dagster metadata storage |\n| MinIO (optional) | Latest | Object storage for raw data |\n| MetalLB (bare-metal) | Latest | LoadBalancer service support |\n| Traefik | Latest | Ingress controller |\n\n### Tools Required\n\n- `kubectl` - Kubernetes CLI\n- `kustomize` (v5.0.0+) - Manifest management\n- `kubeseal` - Sealed Secrets encryption\n- `helm` (optional) - Helm chart management\n\n---\n\n## 🚀 Quick Start\n\n### 1. Clone Repository\n\n```bash\ngit clone https://github.com/arookieds/dagster-deployment.git\ncd dagster-deployment\n```\n\n### 2. Create Namespace\n\n```bash\nkubectl apply -f base/namespace.yaml\n```\n\n### 3. Configure Secrets\n\nCreate sealed secrets for PostgreSQL credentials:\n\n```bash\n# Create plain secret (DO NOT COMMIT)\nkubectl create secret generic postgres-secrets \\\n  --from-literal=postgresql-password='your-password-here' \\\n  --namespace dagster \\\n  --dry-run=client -o yaml \u003e secret.yaml\n\n# Seal the secret\nkubeseal -o yaml \u003c secret.yaml \u003e overlays/prod/sealed-secret.yaml\n\n# Clean up plain secret\nrm secret.yaml\n```\n\n### 4. Update Configuration\n\nEdit `base/kustomization.yaml` to configure:\n- Code location servers (workspace.servers)\n- PostgreSQL connection details\n- Resource limits\n\n### 5. Deploy\n\n```bash\n# Deploy using Kustomize\nkubectl apply -k overlays/prod\n\n# Verify deployment\nkubectl get pods -n dagster\nkubectl get svc -n dagster\n```\n\n### 6. Access Dagster UI\n\n**Option A: Port Forward (Testing)**\n```bash\nkubectl port-forward -n dagster svc/dagster-dagster-webserver 3000:80\n# Open: http://localhost:3000\n```\n\n**Option B: Ingress (Production)**\n```bash\n# Access via configured domain\ncurl http://dagster.homelab.lan\n```\n\n---\n\n## ⚙️ Configuration\n\n### Helm Values (via Kustomize)\n\nThe `kustomization.yaml` includes inline Helm values for the Dagster chart:\n\n```yaml\nhelmCharts:\n  - name: dagster\n    repo: https://dagster-io.github.io/helm\n    version: 1.12.6\n    valuesInline:\n      # PostgreSQL connection\n      postgresql:\n        enabled: false\n        postgresqlHost: postgresql.database.svc.cluster.local\n        postgresqlDatabase: dagster\n        postgresqlUsername: dagster\n      \n      # Code locations (user deployments)\n      dagster-webserver:\n        workspace:\n          servers:\n            - host: \"trading-data\"\n              port: 3030\n              name: \"trading-data\"\n```\n\n### Common Customizations\n\n**Add More Code Locations:**\n```yaml\nworkspace:\n  servers:\n    - host: \"crypto-extract\"\n      port: 3030\n      name: \"crypto-extract\"\n    - host: \"crypto-transform\"\n      port: 3030\n      name: \"crypto-transform\"\n```\n\n**Enable High Availability:**\n```yaml\ndagster-webserver:\n  replicaCount: 3\n```\n\n**Adjust Resource Limits:**\n```yaml\ndagster-webserver:\n  resources:\n    limits:\n      cpu: 1000m\n      memory: 1Gi\n    requests:\n      cpu: 250m\n      memory: 256Mi\n```\n\n---\n\n## 📂 Repository Structure\n\n```\ndagster-deployment/\n├── README.md                          # This file\n├── DEPLOYMENT.md                      # Full deployment documentation\n├── base/                              # Base Kubernetes resources\n│   ├── kustomization.yaml            # Helm chart + base config\n│   ├── namespace.yaml                # Namespace definition\n│   └── ingressroute.yaml             # Traefik ingress (optional)\n└── overlays/\n    └── prod/                          # Production environment\n        ├── kustomization.yaml        # Production patches\n        └── sealed-secret.yaml        # Encrypted secrets\n```\n\n**Note**: _overlays_ will be added at a later stage.\n\n---\n\n## 🔧 Troubleshooting\n\n### Issue: Pods Not Starting\n\n**Symptoms:** Pods in `Pending` or `CrashLoopBackOff` state\n\n**Check:**\n```bash\n# View pod status\nkubectl get pods -n dagster\n\n# Check logs\nkubectl logs -n dagster \u003cpod-name\u003e\n\n# Check events\nkubectl describe pod -n dagster \u003cpod-name\u003e\n```\n\n**Common Causes:**\n- Missing secrets: Ensure `postgres-secrets` sealed secret exists\n- PostgreSQL unreachable: Verify PostgreSQL pod running in `database` namespace\n- Resource limits: Check if pod is OOMKilled due to memory limits\n\n### Issue: Cannot Access UI\n\n**Symptoms:** `curl http://dagster.homelab.lan` returns connection refused or 404\n\n**Diagnosis:**\n```bash\n# Find actual service name created by Helm\nkubectl get svc -n dagster\n\n# Expected: dagster-dagster-webserver\n```\n\n**Fix:** Update `ingressroute.yaml` to use correct service name:\n```yaml\nservices:\n  - name: dagster-dagster-webserver  # Not just \"dagster\"\n    port: 80\n```\n\n**Helm naming convention:** `{releaseName}-{chartName}-{componentName}`\n\n### Issue: Code Location Not Loading\n\n**Symptoms:** Dagster UI shows \"Code location unavailable\"\n\n**Check gRPC connectivity:**\n```bash\n# Verify code location pod running\nkubectl get pods -n dagster -l component=user-code\n\n# Check webserver can reach code location\nkubectl exec -n dagster \u003cwebserver-pod\u003e -- \\\n  nc -zv \u003ccode-location-service\u003e 3030\n```\n\n**Common Causes:**\n- Service name mismatch in `workspace.servers` configuration\n- Code location pod not running\n- gRPC port 3030 not exposed in code location service\n\n---\n\n## 📖 Full Documentation\n\nFor comprehensive deployment documentation including:\n- Detailed architecture explanations\n- Backup and restore procedures\n- Monitoring and alerting setup\n- Migration paths and scaling strategies\n- Complete troubleshooting guide\n\nSee [DEPLOYMENT.md](./DEPLOYMENT.md)\n\n---\n\n## 🎯 Use Cases\n\nThis deployment is designed for:\n\n- **Data Engineering Pipelines**: ETL/ELT workflows for batch processing\n- **Financial Data Processing**: Crypto market data extraction and transformation\n- **ML Pipeline Orchestration**: Scheduling model training and inference\n- **Multi-tenant Deployments**: Separate code locations per team/project\n\n**Not suitable for:**\n- Real-time streaming (use Kafka/Flink for high-frequency data)\n- Extremely high-throughput (\u003e10k jobs/minute)\n- Windows-based deployments (Linux containers only)\n\n---\n\n## 🔐 Security Considerations\n\n- **Sealed Secrets**: All credentials encrypted using Sealed Secrets controller\n- **No External Exposure**: Dagster UI accessible only within cluster network or via VPN\n- **Namespace Isolation**: Runs in dedicated `dagster` namespace\n- **Minimal Privileges**: Service accounts follow principle of least privilege\n\n**For Production:**\n- Enable authentication (OAuth2, LDAP, SAML)\n- Implement Network Policies for namespace isolation\n- Use separate PostgreSQL instance (not shared)\n- Enable TLS for gRPC communication\n\n---\n\n## 🤝 Contributing\n\nContributions welcome! Please follow these guidelines:\n\n1. **Fork** the repository\n2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)\n3. **Commit** your changes (`git commit -m 'Add amazing feature'`)\n4. **Push** to the branch (`git push origin feature/amazing-feature`)\n5. **Open** a Pull Request\n\n**Please include:**\n- Description of changes\n- Rationale for the change\n- Testing performed (include kubectl commands and output)\n- Documentation updates (if applicable)\n\n---\n\n## 📝 License\n\nThis project is licensed under the MIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n## 🙏 Acknowledgments\n\n- **Dagster Team** - For the excellent orchestration framework\n- **Bitnami** - For well-maintained Helm charts\n- **Kubernetes Community** - For robust container orchestration\n\n---\n\n## 📞 Support\n\n- **Issues**: [GitHub Issues](https://github.com/arookieds/dagster-deployment/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/arookieds/dagster-deployment/discussions)\n- **Dagster Slack**: [dagster.slack.com](https://dagster.slack.com)\n\n---\n\n## 🗓️ Changelog\n\n| Date | Version | Changes |\n|------|---------|---------|\n| 2025-12-14 | 1.0.0 | Initial public release |\n| 2025-12-12 | 0.9.0 | Internal deployment and testing |\n\n---\n\n**⭐ If this repository helped you, please consider giving it a star!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farookieds%2Fdagster-deployment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farookieds%2Fdagster-deployment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farookieds%2Fdagster-deployment/lists"}