{"id":20162734,"url":"https://github.com/airscholar/kubernetes-for-dataengineering","last_synced_at":"2025-07-17T12:38:19.148Z","repository":{"id":218961139,"uuid":"747774027","full_name":"airscholar/Kubernetes-For-DataEngineering","owner":"airscholar","description":"This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow","archived":false,"fork":false,"pushed_at":"2024-01-26T18:13:49.000Z","size":8,"stargazers_count":18,"open_issues_count":0,"forks_count":14,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-24T02:22:01.833Z","etag":null,"topics":["apache-airflow","data-engineering","kubernetes"],"latest_commit_sha":null,"homepage":"https://youtu.be/ISftrpAImHA","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/airscholar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-24T16:04:32.000Z","updated_at":"2025-02-13T15:52:30.000Z","dependencies_parsed_at":"2024-01-24T18:50:40.856Z","dependency_job_id":"2fde0f49-f37a-4866-b1a2-d90584b568fb","html_url":"https://github.com/airscholar/Kubernetes-For-DataEngineering","commit_stats":null,"previous_names":["airscholar/kubernetes-for-dataengineering"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airscholar%2FKubernetes-For-DataEngineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airscholar%2FKubernetes-For-DataEngineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airscholar%2FKubernetes-For-DataEngineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airscholar%2FKubernetes-For-DataEngineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/airscholar","download_url":"https://codeload.github.com/airscholar/Kubernetes-For-DataEngineering/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137973,"owners_count":21053771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-airflow","data-engineering","kubernetes"],"created_at":"2024-11-14T00:26:39.815Z","updated_at":"2025-04-10T00:36:12.396Z","avatar_url":"https://github.com/airscholar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kubernetes for Data Engineering\n\nThis repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow. It includes the setup for the Kubernetes Dashboard, which provides a user-friendly web interface for managing Kubernetes clusters, and Apache Airflow, a platform to programmatically author, schedule, and monitor workflows.\n\n## Repository Structure\n\nThe repository is organized as follows:\n\n```\n.\n├── dags\n│   ├── fetch_and_preview.py\n│   └── hello.py\n└── k8s\n    ├── dashboard-adminuser.yaml\n    ├── dashboard-clusterrole.yaml\n    ├── dashboard-secret.yaml\n    ├── recommended-dashboard.yaml\n    └── values.yaml\n```\n\n### DAGs\n\n- `fetch_and_preview.py`: A DAG for fetching data and providing a preview.\n- `hello.py`: A simple example DAG to demonstrate basic Airflow concepts.\n\n### Kubernetes (k8s) Configuration\n\n- `dashboard-adminuser.yaml`: YAML file for setting up an admin user for the Kubernetes Dashboard.\n- `dashboard-clusterrole.yaml`: YAML file defining the cluster role for the Kubernetes Dashboard.\n- `dashboard-secret.yaml`: YAML file for managing secrets used by the Kubernetes Dashboard.\n- `recommended-dashboard.yaml`: YAML file for deploying the recommended Kubernetes Dashboard setup.\n- `values.yaml`: YAML file containing values for customizing the Kubernetes setup.\n\n## Getting Started\n\n### Prerequisites\n\n- A Kubernetes cluster\n- `kubectl` installed and configured\n- Helm (optional, but recommended for managing Kubernetes applications)\n\n### Setup\n\n1. **Deploy the Kubernetes Dashboard:**\n\n   To deploy the Kubernetes Dashboard, apply the YAML files in the `k8s` directory:\n\n   ```bash\n   kubectl apply -f k8s/\n   ```\n\n   This will set up the Kubernetes Dashboard with the necessary roles and permissions.\n\n2. **Accessing the Kubernetes Dashboard:**\n\n   To access the Dashboard, you may need to start a proxy server:\n\n   ```bash\n   kubectl proxy\n   ```\n\n   Then, access the Dashboard at: `http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/`.\n\n   Use the token generated for the admin user to log in (see `dashboard-secret.yaml`).\n\n3. **Deploy Apache Airflow:**\n\n   You can deploy Apache Airflow using Helm or by applying custom YAML files. For Helm:\n\n   ```bash\n   helm repo add apache-airflow https://airflow.apache.org\n   helm install airflow apache-airflow/airflow -f k8s/values.yaml\n   ```\n\n   This will deploy Airflow with the settings defined in `values.yaml`.\n\n4. **Adding DAGs to Airflow:**\n\n   Copy your DAG files (e.g., `fetch_and_preview.py`, `hello.py`) into the DAGs folder of your Airflow deployment. The method of copying depends on your Airflow setup (e.g., using Persistent Volume, Git-sync).\n\n### Usage\n\n- **Kubernetes Dashboard:** Use the Dashboard to monitor and manage the Kubernetes cluster.\n- **Apache Airflow:** Access the Airflow web UI to manage, schedule, and monitor workflows.\n\n## Video\nFor a complete walkthrough and practical demonstration, check out the video here: \n[![Kubernetes for Modern Data Engineering](https://img.youtube.com/vi/ISftrpAImHA/0.jpg)](https://youtu.be/ISftrpAImHA)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairscholar%2Fkubernetes-for-dataengineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fairscholar%2Fkubernetes-for-dataengineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairscholar%2Fkubernetes-for-dataengineering/lists"}