https://github.com/bedag/hellodata-be-workspace-starter
This is a the starter workspace for HelloDATA BE.
https://github.com/bedag/hellodata-be-workspace-starter
Last synced: 5 months ago
JSON representation
This is a the starter workspace for HelloDATA BE.
- Host: GitHub
- URL: https://github.com/bedag/hellodata-be-workspace-starter
- Owner: bedag
- Created: 2024-01-31T09:59:03.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-19T15:14:13.000Z (almost 2 years ago)
- Last Synced: 2024-06-20T02:33:55.965Z (almost 2 years ago)
- Language: Python
- Homepage: https://kanton-bern.github.io/hellodata-be/concepts/workspaces/
- Size: 264 KB
- Stars: 2
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Workspace `Boiler Plate`
This is a the boiler-plate workspace that you can clone and add your requirements. This should help you to understand how to set up a project within HelloDATA.
Goal: that you can test and create your DAGs, dbt, and whatever you need within this git-repo and deploy anywhere with regular HelloDATA deployment.
We use a [Airflow DAGs](/src/dags/) that will be copied with an initContainer into central Airflow orchestrator within Business Domain.
Find the detailed documentation on [Workspaces Documentation](https://kanton-bern.github.io/hellodata-be/concepts/workspaces/).
## Deployment
- the dags you are running on HelloDATA are in [/src/dags/](/src/dags/) folder. Check [boiler-example.py](/src/dags/boiler-example.py) for an example
- add your deployments needs and requirements into [hd_env.py](deployment/local/airflow/dags/hd_env.py). This step will be integrated in your working HelloDATA environments.
- make sure to set the ENV variable (e.g. postgres local password) on `deployment/local/env`
- you can specify all your requirements in [Dockerfile](Dockerfile)
- build your image with `docker build -t hellodata-ws-boilerplate:local-build .` or create your own one. Be sure that your HelloDATA instance has access to your uploaded Docker registry.
## Airflow locally (testing)
For testing your Airflow locally. You can use above menitoned Astro CLI, which makes it very easy to run Airflow with one command ` astro dev start -e ../env`. See more details on [Airflow Readme](/src/dags/airflow/README.md).
Requirements:
- Local Docker installed (either native or Docker-Desktop)
- **Linux/MacOS**: add `127.0.0.1 host.docker.internal` to the `/etc/hosts` file.
- **Windows**: Enable in Docker Desktop under `Settings -> General -> Use WSL 2 based engine` the settings: `Add the *.docker.internal names to the host's etc/hosts file (Requires password)`
- Make sure Docker-Desktop entered it correctly in `C:\Windows\System32\drivers\etc\hosts`. There were some [cases](https://github.com/kanton-bern/hellodata-be/issues/21#issuecomment-1913578206) where it added wrongly. It should look something like:
```sh
# Added by Docker Desktop
host.docker.internal
gateway.docker.internal
# To allow the same kube context to work on the host and the container:
127.0.0.1 kubernetes.docker.internal
# End of section
```
- make sure Kubernetes is enabled
- copy you local kube-file to astro: `cp ~/.kube/config src/dags/airflow/include/.kube/`
- **Windows**: The file is under `%USERPROFILE%\.kube\config` (C:\Users\[YourIdHere]\.kube)
- install `KubernetesPodOperator` locally with pip (not with astro) install `pip install apache-airflow-providers-cncf-kubernetes`.
- if you store data from one task to the next, you need a volume-mount (see `pvc.yaml`) and configure `volumes` and `volume_mounts` in Airflow (see example)