{"id":20143990,"url":"https://github.com/bytemedirk/apache-airflow-data-engineer","last_synced_at":"2026-04-09T21:03:23.279Z","repository":{"id":232607829,"uuid":"784698280","full_name":"ByteMeDirk/apache-airflow-data-engineer","owner":"ByteMeDirk","description":"Apache Airflow For Data Engineers Tutorial","archived":false,"fork":false,"pushed_at":"2024-04-19T14:38:17.000Z","size":7017,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-13T10:49:59.816Z","etag":null,"topics":["airflow","airflow-docker","data-engineering","docker","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteMeDirk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-10T11:28:57.000Z","updated_at":"2024-08-08T17:31:16.000Z","dependencies_parsed_at":"2024-04-19T15:59:04.143Z","dependency_job_id":null,"html_url":"https://github.com/ByteMeDirk/apache-airflow-data-engineer","commit_stats":null,"previous_names":["bytemedirk/apache-airflow-data-engineer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteMeDirk%2Fapache-airflow-data-engineer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteMeDirk%2Fapache-airflow-data-engineer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteMeDirk%2Fapache-airflow-data-engineer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteMeDirk%2Fapache-airflow-data-engineer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteMeDirk","download_url":"https://codeload.github.com/ByteMeDirk/apache-airflow-data-engineer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241587914,"owners_count":19986627,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-docker","data-engineering","docker","pandas","python"],"created_at":"2024-11-13T22:08:19.447Z","updated_at":"2026-04-09T21:03:18.258Z","avatar_url":"https://github.com/ByteMeDirk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# apache-airflow-data-engineer\n\nMastering Apache Airflow for Data Engineers: A Comprehensive Guide to Key Features and Functionalities\n\nYou can find the link to the tutorial [here](https://medium.com/@bytemedirk/apache-airflow-for-data-engineers-ca39cc897070).\n\n# Airflow Environment Setup\n\nThis project uses Apache Airflow to manage and schedule data pipelines. The project is containerized using Docker and\norchestrated using Docker Compose.\n\n## Prerequisites\n\n- Docker\n- Docker Compose\n\n## Setup\n\n1. Clone the repository to your local machine.\n\n2. Navigate to the project directory.\n\n3. Build the Docker images:\n\n```bash\ndocker-compose build\n```\n\n4. Start the Airflow services:\n\n```bash\ndocker-compose up\n```\n\n## Configuration\n\nThe `docker-compose.yaml` file contains the configuration for the Airflow services. The following environment variables\nare used:\n\n- `AIRFLOW__CORE__EXECUTOR`: The executor to use for Airflow. In this project, we use the `CeleryExecutor`.\n- `AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`: The connection string for the Airflow metadata database.\n- `AIRFLOW__CELERY__RESULT_BACKEND`: The connection string for the backend that Celery uses for storing results.\n- `AIRFLOW__CELERY__BROKER_URL`: The connection string for the message broker that Celery uses for sending tasks.\n- `AIRFLOW__CORE__FERNET_KEY`: The Fernet key used for encrypting passwords in the connection configuration.\n- `AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION`: Whether to pause DAGs when they are created.\n- `AIRFLOW__CORE__LOAD_EXAMPLES`: Whether to load the example DAGs that come with Airflow.\n- `AIRFLOW__API__AUTH_BACKENDS`: The authentication backends to use for the Airflow API.\n\n## Running the Airflow Webserver\n\nOnce the services are up and running, you can access the Airflow webserver at `http://localhost:8080`.\n\n## DAGs\n\nThe DAGs are defined in Python files in the `dags` directory.\n\n## Data\n\nThe data for the DAGs is stored in CSV files in the `datasets` directory.\n\n## Logs\n\nThe logs for the Airflow tasks are stored in the `logs` directory.\n\n## Plugins\n\nAny Airflow plugins can be added to the `plugins` directory.\n\n## Stopping the Services\n\nTo stop the Airflow services, run:\n\n```bash\ndocker-compose down\n```\n\n## Additional Information\n\nFor more information on Apache Airflow, see the [official documentation](https://airflow.apache.org/docs/). For more\ninformation on Docker and Docker Compose, see the [Docker documentation](https://docs.docker.com/) and\nthe [Docker Compose documentation](https://docs.docker.com/compose/).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytemedirk%2Fapache-airflow-data-engineer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytemedirk%2Fapache-airflow-data-engineer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytemedirk%2Fapache-airflow-data-engineer/lists"}