https://github.com/airbytehq/airflow-summit-airbyte-2022
git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit
https://github.com/airbytehq/airflow-summit-airbyte-2022
airbyte airflow data-engineering elt
Last synced: 2 months ago
JSON representation
git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit
- Host: GitHub
- URL: https://github.com/airbytehq/airflow-summit-airbyte-2022
- Owner: airbytehq
- Created: 2022-05-23T23:54:23.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-12T19:25:36.000Z (over 2 years ago)
- Last Synced: 2025-04-23T21:38:10.393Z (6 months ago)
- Topics: airbyte, airflow, data-engineering, elt
- Language: Shell
- Homepage: https://www.airbyte.com
- Size: 35.2 KB
- Stars: 53
- Watchers: 13
- Forks: 20
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# `git push` your data stack with Airbyte, Airflow and dbt
_For Airflow Summit 2022, by [@marcosmarcm](https://github.com/marcosmarxm) and [@evantahler](https://github.com/evantahler) from [Airbyte](https://github.com/airbytehq)_
[](https://github.com/airbytehq/airflow-summit-airbyte-2022/actions/workflows/data-pipeline.yml)
## What:
Links:
- [Slides](https://docs.google.com/presentation/d/17TuHlzgF3x_Q2NtkOq0O7SmR9e0MGqXMqlwVqJyUoKI)
- [Speaker Page](https://airflowsummit.org/sessions/2022/git-push-your-data-stack-with-airbyte-airflow-and-dbt/)
This project configures a sample data stack orchestrated by Airflow, using Airbyte to Extract and Load data, and dbt to Transform it.
## Running Locally:
0. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) and Python 3 (if you are on MacOS, you already have Python 3).
1. Create `{HOME}/.octavia` and add the following credentials for using a local postgres database managed by Docker:
```
POSTGRES_HOST=host.docker.internal
POSTGRES_PASSWORD=password
POSTGRES_USERNAME=demo_user
POSTGRES_DATABASE=postgres
```
2. Create the profile dbt in `{HOME}/.dbt/profiles.yaml`
```
config:
partial_parse: true
printer_width: 120
send_anonymous_usage_stats: false
use_colors: true
normalize:
outputs:
prod:
dbname: postgres
host: host.docker.internal
pass: password
port: 5432
schema: public
threads: 8
type: postgres
user: demo_user
target: prod
```
3. Run the whole data stack using `./tools/start.sh`. This will install local requirements (PyYAML) and run everything though Docker. The script will exit when complete, but the Docker containers will remain running.
In your browser:
- Visit http://localhost:8080/ to see the Airflow UI (user: `airflow`, password: `airflow`) and your completed DAG.
- Visit http://localhost:8000/ to see the Airbyte UI and your completed Sync.
- Visit your local postgres database (`localhost:5432`) with the `username=demo_user` and `password=password` to see the staged and transformed data.
## Shut it down
Run `./tools/stop.sh` to stop the Docker containers.
## Testing
This repository is tested using Github Actions.