{"id":23238454,"url":"https://github.com/zkan/data-pipelines-with-airflow","last_synced_at":"2025-08-19T23:32:52.552Z","repository":{"id":144820392,"uuid":"410478945","full_name":"zkan/data-pipelines-with-airflow","owner":"zkan","description":"Skooldio: Data Pipelines with Airflow","archived":false,"fork":false,"pushed_at":"2023-12-30T08:07:47.000Z","size":1997,"stargazers_count":19,"open_issues_count":0,"forks_count":10,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-14T13:48:42.489Z","etag":null,"topics":["apache-airflow","data-engineering","data-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zkan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-09-26T07:13:52.000Z","updated_at":"2024-03-23T00:06:23.000Z","dependencies_parsed_at":"2023-12-26T10:38:45.770Z","dependency_job_id":null,"html_url":"https://github.com/zkan/data-pipelines-with-airflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkan%2Fdata-pipelines-with-airflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkan%2Fdata-pipelines-with-airflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkan%2Fdata-pipelines-with-airflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkan%2Fdata-pipelines-with-airflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zkan","download_url":"https://codeload.github.com/zkan/data-pipelines-with-airflow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230374249,"owners_count":18216044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-airflow","data-engineering","data-pipeline"],"created_at":"2024-12-19T04:17:53.590Z","updated_at":"2024-12-19T04:17:54.184Z","avatar_url":"https://github.com/zkan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Pipelines with Airflow\n\n## Contents\n\n- Prerequisites\n  - [Installing Docker Desktop](./docs/installing-docker-desktop.md)\n  - [Installing Visual Studio Code](./docs/installing-vscode.md)\n- [Data Source](#data-source)\n- [Starting Airflow](#starting-airflow)\n- [Project Instruction](./docs/project-instruction.md)\n- [Airflow S3 Connection to MinIO](#airflow-s3-connection-to-minio)\n- [Running Tests](#running-tests)\n- [References](#references)\n\n## Data Source\n\n- [CCXT - CryptoCurrency eXchange Trading Library](https://github.com/ccxt/ccxt)\n- [Hello, CCXT!](https://github.com/zkan/hello-ccxt)\n\n## Starting Airflow\n\nBefore we run Airflow, let's create these folders below first. Please note that if you're using Windows, you can skip this step.\n\n```sh\nmkdir -p mnt/dags mnt/logs mnt/plugins mnt/tests\n```\n\nOn **Linux**, please make sure to configure the Airflow user for the Docker compose:\n\n```sh\necho -e \"AIRFLOW_UID=$(id -u)\" \u003e .env\n```\n\nWith `LocalExecutor`\n\n```sh\ndocker compose build\ndocker compose up\n```\n\nWith `CeleryExecutor`\n\n```sh\ndocker compose -f docker-compose-celery.yml build\ndocker compose -f docker-compose-celery.yml up\n```\n\nWith `SequentialExecutor` (NOT recommended for production use)\n\n```sh\ndocker compose -f docker-compose-sequential.yml build\ndocker compose -f docker-compose-sequential.yml up\n```\n\nTo clean up the project, press Ctrl+C then run:\n\n```sh\ndocker compose down\n```\n\n## Airflow Connection to MinIO\n\nSince MinIO offers S3 compatible object storage, we can set the connection type to \"Amazon Web Services\". However, we'll need to set an extra option, so that Airflow connects to MinIO instead of S3.\n\n- Connection Name: `minio` or any name you like\n- Connection Type: Amazon Web Services\n- AWS Access Key ID: `\u003creplace_here_with_your_minio_access_key\u003e`\n- AWS Secret Access Key: `\u003creplace_here_with_your_minio_secret_key\u003e`\n- Extra: a JSON object with the following properties:\n  ```json\n  {\n    \"host\": \"http://minio:9000\"\n  }\n  ```\n\nSee the example below:\n\n![Airflow Connection to MinIO](./docs/images/airflow-connection-to-minio.png)\n\n**Note:** If you were using AWS S3 already, you don't need to specify the host in the extra.\n\n## Running Tests\n\nFirst we need to install pytest:\n\n```sh\npip install pytest\n```\n\nRun tests:\n\n```sh\nexport PYTHONPATH=/opt/airflow/plugins\npytest\n```\n\n## References\n\n- [Running Airflow in Docker](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html)\n- [MinIO Docker Quickstart Guide](https://docs.min.io/docs/minio-docker-quickstart-guide.html)\n- [Deploy MinIO on Docker Compose](https://docs.min.io/docs/deploy-minio-on-docker-compose)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkan%2Fdata-pipelines-with-airflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzkan%2Fdata-pipelines-with-airflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkan%2Fdata-pipelines-with-airflow/lists"}