An open API service indexing awesome lists of open source software.

https://github.com/sajonaro/airflow-docker

a quick airflow + dbt + astronomer-cosmos in docker demo
https://github.com/sajonaro/airflow-docker

airflow airflow-dags airflow-plugin apache asatronomer-cosmos dbt docker python

Last synced: 5 months ago
JSON representation

a quick airflow + dbt + astronomer-cosmos in docker demo

Awesome Lists containing this project

README

          

## About

This is a small PoC demonstrating use of airflow + Dbt+ cosmos in docker

![diagram](./high-level-diagram.png)

[Dbt](https://www.getdbt.com/) provides:
- well structured transformation pipeline
* model first approach: i.e. each transformation step is defined as 'model'
- testing
- version control
- modularization & centralization of (analytics) code


[Cosmos](https://www.astronomer.io/cosmos/) provides:
- automatic conversion of Dbt projects into DAGs
- dependency visualization
- ability to run any number of Dbt jobs (virtually for free as there is dependency on Dbt core only)

[Airflow](https://airflow.apache.org/) provides:
- extensible automation/orchestrating platform to run Dbt steps as [DAGs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#dags)
- web ui
- cli
- ..and more

## 101
* start up the stack
```bash
$ ./start.sh
```
... which does following
```bash
$ docker compose up airflow-init
$ sleep 5
$ docker compose up
```
* configure 'connections' (so that Dbt can use them) via Airflow UI
- open http://localhost:8080 (u,p -> airflow,airflow - check docker-compose.yml )
- navigate to Admin/Connections/ add new
- e.g we need to create `ex1_db` and `ex2_db` (see ./dags/include.profiles.py to see why)
- connection details are postgresql://airflow:airflow@northwind:5454/ex1 ( check docker-compose.yml)

* to use dbt cli (via docker compose):
```bash
# use dbt commands [e.g init, run, seed, test ]
# and then follow command prompts
$ docker compose run --rm dbt-cli dbt init
```
.. or alternatively (via virtual environment)
```bash
# 1 - create virtual environment = dbt
# 2 - activate it
# 3 - install dbt libs

$ python3 -m venv dbt \
&& source dbt/bin/activate \
&& pip install dbt-core dbt-postgres \
&& deactivate
```

### some useful commands
* use postgres cli
```bash
# attach to db container as root
# then connect to database e.g.
# user = airflow, db = ex1, port = 5454
$ psql -U airflow -h localhost ex1 -p 5454

# 'out' is the schema name
SELECT * FROM pg_catalog.pg_tables WHERE schemaname='out';

# assuming there is table `customers` in schema `out`
SELECT * from out.customers;
```

* if on linux

```bash
#create directories
$ mkdir ./dags ./logs ./plugins ./dbt
#configure UID and GID
$ echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
```