Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ericbutera/airflow-instance
Apache Airflow demo environment. Includes a couple of example DAGs.
https://github.com/ericbutera/airflow-instance
Last synced: 8 days ago
JSON representation
Apache Airflow demo environment. Includes a couple of example DAGs.
- Host: GitHub
- URL: https://github.com/ericbutera/airflow-instance
- Owner: ericbutera
- Created: 2022-08-18T00:06:38.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-20T14:36:13.000Z (about 1 year ago)
- Last Synced: 2024-11-13T03:18:11.464Z (2 months ago)
- Language: Python
- Homepage:
- Size: 205 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Local Airflow Setup
Local development environment for Airflow.
## Setup
[install docs](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html)
Run `make setup` then `make run`
Navigate to . Creds: `airflow:airflow`
## Resources
- [Data Engineering w/ GCP](https://github.com/PacktPublishing/Data-Engineering-with-Google-Cloud-Platform)
- [Data Pipelines w/ Airflow](https://github.com/BasPH/data-pipelines-with-apache-airflow)
- [Docs](https://airflow.apache.org/docs/)## Research
- [Kubernetes](https://airflow.apache.org/docs/apache-airflow/stable/kubernetes.html)
- [Querying Cloud Storage data](https://cloud.google.com/bigquery/docs/external-data-cloud-storage)
- [federated queries](https://cloud.google.com/bigquery/docs/federated-queries-intro)
- [see ETL, EL, and ELT in What is Google BigQuery](https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/ch01.html#what_is_google_bigqueryquestion_mark)
- [Loading JSON data from Cloud Storage](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#loading_json_data_with_schema_auto-detection)
- [Cloud storage transfers](https://cloud.google.com/bigquery/docs/cloud-storage-transfer)
- [Load data into BigQuery](https://cloud.google.com/bigquery/docs/loading-data)
- [Cloud storage triggers](https://cloud.google.com/functions/docs/calling/storage)
- JSONLines or something like [avro](https://avro.apache.org/), [capacitor](https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format)
- [PubSub](https://cloud.google.com/pubsub/architecture)
- _send over 500 million messages per second_
- [Running tasks in containers 10.3.2](https://livebook.manning.com/book/data-pipelines-with-apache-airflow/chapter-10/v-4/)
- easier dep management
- uniform approach for different tasks
- improved testability
- [airflow go client](https://github.com/apache/airflow-client-go)
- if running in container, any language is possible
- from scratch extremely small binaries!
- can integrate with airflow via OpenAPI spec## Operators
- [airflow.providers.google.cloud.transfers.gcs_to_bigquery](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/gcs_to_bigquery/index.html)
## Hello World Go DAG
In order to get docker operator working on Mac, some work-arounds were employed. The docker socket needed to be [wrapped in a service](https://onedevblog.com/how-to-fix-a-permission-denied-when-using-dockeroperator-in-airflow/).
![Successful Run](/docs/hello_world_go_run.png)