{"id":24157704,"url":"https://github.com/bdbao/airflow-docker","last_synced_at":"2026-04-11T09:05:40.204Z","repository":{"id":271882347,"uuid":"865926669","full_name":"bdbao/Airflow-docker","owner":"bdbao","description":"This project automates ETL workflows using Apache Airflow on Docker containers to ingest data from CSV, Excel, API sources into PostgreSQL, MySQL, Microsoft SQL Server, MongoDB, and Google Cloud Platform; performed data transfers between DBMS.","archived":false,"fork":false,"pushed_at":"2025-02-17T13:00:52.000Z","size":7705,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T13:38:51.210Z","etag":null,"topics":["apache-airflow","data-pipeline","docker","google-cloud-platform","mongodb","mssql","mysql","postgresql","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bdbao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-01T11:06:50.000Z","updated_at":"2025-02-17T13:00:56.000Z","dependencies_parsed_at":"2025-01-10T14:34:00.785Z","dependency_job_id":"d24ad8f6-74eb-4ec3-95dc-46c4043f035b","html_url":"https://github.com/bdbao/Airflow-docker","commit_stats":null,"previous_names":["bdbao/airflow-docker"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdbao%2FAirflow-docker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdbao%2FAirflow-docker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdbao%2FAirflow-docker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bdbao%2FAirflow-docker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bdbao","download_url":"https://codeload.github.com/bdbao/Airflow-docker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241445942,"owners_count":19964076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-airflow","data-pipeline","docker","google-cloud-platform","mongodb","mssql","mysql","postgresql","python"],"created_at":"2025-01-12T14:17:22.089Z","updated_at":"2026-04-11T09:05:40.179Z","avatar_url":"https://github.com/bdbao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Demonstration tasks using Airflow on Docker\n---\n- [Demonstration tasks using Airflow on Docker](#demonstration-tasks-using-airflow-on-docker)\n- [Quick Start](#quick-start)\n- [Build from scratch](#build-from-scratch)\n  - [Update Library](#update-library)\n  - [Host Database server](#host-database-server)\n    - [PostgreSQL server](#postgresql-server)\n    - [MySQL server](#mysql-server)\n    - [MongoDB server](#mongodb-server)\n  - [Data Migration from MSSQL to Gooogle Cloud Platform](#data-migration-from-mssql-to-gooogle-cloud-platform)\n  - [Manipulate on GUI](#manipulate-on-gui)\n    - [Airflow UI](#airflow-ui)\n    - [View database on DBeaver](#view-database-on-dbeaver)\n\n# Quick Start\n- Open **Docker Desktop**.\n```bash\ngit clone https://github.com/bdbao/Airflow-docker\ncd Airflow-docker\n\nmkdir -p ./logs ./plugins ./config\necho -e \"AIRFLOW_UID=$(id -u) \\nAIRFLOW_GID=0\" \u003e .env\n\n# if updated requirements.txt: `docker compose down` -\u003e delete all related images (command is bellow) -\u003e run this command again.\ndocker build --no-cache . --tag extending_airflow:latest \n\nbrew install postgresql@16 # or: postgresql\nbrew install mysql\nmake start\n\nexport LC_ALL=\"en_US.UTF-8\"\nexport LC_CTYPE=\"en_US.UTF-8\"\ninitdb YOUR_ARBITRARY_PATH/postgresDB\npsql postgres\n    CREATE DATABASE db_airflow;\n    CREATE USER user_airflow WITH PASSWORD '1234';\n    ALTER USER user_airflow WITH SUPERUSER;\n    \\q\nmysql -u root\n    CREATE DATABASE db_airflow;\n    CREATE USER 'user_airflow'@'localhost' IDENTIFIED BY 'admin@123';\n    GRANT ALL PRIVILEGES ON *.* TO 'user_airflow'@'localhost' WITH GRANT OPTION;\n    FLUSH PRIVILEGES;\n    \\q\n```\n- Open **http://localhost:8080**. (Default account was created with: User: **airflow** / Password: **airflow**)\n- Open **DBeaver** to view databases.\n  \nStopping all services by `make stop`.\\\nDelete all images relating to Airflow: `docker images | grep \"airflow\" | awk '{print $3}' | xargs docker rmi -f \u0026\u0026 docker image prune -f \u0026\u0026 docker rmi -f postgres:13 redis`.\n\n# Build from scratch\n```bash\nmkdir Airlow-docker \u0026\u0026 cd Airlow-docker\n```\n- Fetching `docker-compose.yaml`\n```bash\ncurl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.2/docker-compose.yaml'\n```\n- Initialize environment:\n```bash\nmkdir -p ./dags ./logs ./plugins ./config\necho -e \"AIRFLOW_UID=$(id -u) \\nAIRFLOW_GID=0\" \u003e .env\n```\n- Open **Docker Desktop**.\n- Initialize the database:\n```bash\ndocker compose up airflow-init\n```\n- Running Airflow:\n```bash\ndocker compose up -d\n```\nOpen **http://localhost:8080**.\n(Default account was created with: User: **airflow** / Password: **airflow**)\n\n## Update Library\n- Create Dockerfile\n- Change in `docker-compose.yaml` like this:\n    ```\n    # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.6.2}\n    build: .\n    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'\n    ```\n- Build the extended image by command\n```bash\ndocker build . --tag extending_airflow:latest\ndocker-compose up -d --no-deps --build airflow-webserver airflow-scheduler\n```\n- Modify the volume key to include the additional paths want to mount (at line 79):\\\n`- ${AIRFLOW_PROJ_DIR:-.}/data:/opt/airflow/data`\n- Run full Docker services\n```bash\ndocker compose down\ndocker compose up -d\n\ndocker ps\ndocker exec -it CONTAINER_ID bash\n```\n\n## Host Database server\n### PostgreSQL server\n```bash\nbrew install postgresql\nbrew services start postgresql@16 # (or postgresql)\n(brew services stop postgresql@16)\nbrew services list\n\nexport LC_ALL=\"en_US.UTF-8\"\nexport LC_CTYPE=\"en_US.UTF-8\"\ninitdb YOUR_ARBITRARY_PATH/postgresDB\n\npsql postgres\n    \\l # list all db\n\n    # create db\n    CREATE DATABASE db_airflow;\n    \n    # delete db\n    SELECT pg_terminate_backend(pg_stat_activity.pid)\n    FROM pg_stat_activity\n    WHERE pg_stat_activity.datname = 'your_db';\n    DROP DATABASE your_db;\n\n    # list all users\n    \\du \n\n    # add new user\n    CREATE USER user_airflow WITH PASSWORD '1234';\n    ALTER USER user_airflow WITH SUPERUSER; # (optional)\n    \n    # delete user\n    SELECT pg_terminate_backend(pg_stat_activity.pid)\n    FROM pg_stat_activity\n    WHERE pg_stat_activity.usename = 'username_to_delete';\n    DROP USER username_to_delete;\n\n    # change password\n    ALTER USER your_username WITH PASSWORD 'new_password';\n\n    \\q # quit psql postgres\n```\n\n### MySQL server\n```bash\nbrew install mysql\nbrew services start mysql\n(brew services stop mysql)\nbrew services list\n\nmysql -u root\nmysql -u root -p # if you’ve set a password\n    # list all db\n    SHOW DATABASES; \n    \n    # create db\n    CREATE DATABASE db_airflow;\n\n    USE db_airflow;\n    SHOW TABLES;\n    SELECT * FROM table_name;\n\n    # delete db\n    DROP DATABASE db_airflow;\n\n    # list all users\n    SELECT User, Host FROM mysql.user;\n    SHOW GRANTS FOR 'root'@'localhost'; # show user privileges\n \n    # add new user\n    CREATE USER 'user_airflow'@'localhost' IDENTIFIED BY 'admin@123'; # (use % for any host)\n    GRANT ALL PRIVILEGES ON *.* TO 'user_airflow'@'localhost' WITH GRANT OPTION;\n    FLUSH PRIVILEGES; # apply changes\n    SHOW GRANTS FOR 'user_airflow'@'%';\n\n    # delete user\n    DROP USER 'user_airflow'@'localhost';\n\n    # change password\n    ALTER USER 'username'@'host' IDENTIFIED BY 'new_password';\n    FLUSH PRIVILEGES;\n\n    \\q # quit mysql\n```\n### MongoDB server\nUse [MongoDB Compass](https://www.mongodb.com/products/tools/compass).\n```bash\nshow dbs\n\nuse db_airflow\ndb[\"Invoice_fromMySQL\"].find()\n\n# use admin\n# db.grantRolesToUser(\"user_airflow\", [{ role: \"readWrite\", db: \"db_airflow\" }])\n```\n\n## Data Migration from MSSQL to Gooogle Cloud Platform\n1. Create a Google Cloud Project\n- Go to [Google Cloud Console](https://console.cloud.google.com).\n- Click **Select a Project** \u003e **New Project**.\n- Provide a project name and click **Create**.\n2. Enable Required APIs\n- In the **Cloud Console**, navigate to **APIs \u0026 Services \u003e Library**.\n- Enable the following APIs: **BigQuery API** and **Cloud Storage API**.\n3. Create a Service Account (User Principal)\n- Navigate to **IAM \u0026 Admin \u003e Service Accounts**.\n- Click **+ Create Service Account**.\n- Provide a name (for example: *techdata-cloud @ zinc-union-443512-p7.iam.gserviceaccount.com*) for the service account and click **Create**.\n- Assign the role **BigQuery Admin** (for full BigQuery management access).\n- Click **Done**.\n4. Generate and Save Service Account Key (JSON)\n- Go back to **IAM \u0026 Admin \u003e Service Accounts**.\n- Select the service account you just created.\n- Click **Keys \u003e Add Key \u003e Create New Key**.\n- Select **JSON** format and download the file.\n- Save the downloaded JSON file as: `./config/edtech.json`.\n5. (Optional) Access airflow-containers bash:\n    ```bash\n    docker exec -u root -it airflow-docker-airflow-worker-1 bash # similarly with: webserver-1, scheduler-1\n        apt-get install -y libgeos-dev\n        # some more libs\n    ```\n\n## Manipulate on GUI\n### Airflow UI\n- Click **CSV_to_Postgres_Pipeline** in Airflow, then navigate to **Graph** -\u003e Click on **Node** -\u003e **Log** to view the output console.\n- Re-run (Click **Play button** \"Trigger DAG\") once editting in `dags/` scripts.\n\n### View database on DBeaver\nOpen **DBeaver** to view overall database PostgreSQL (by user_airflow), MySQL (by root, or user_airflow).\n\n- Fix **Issue in DBeaver**: View MySQL db with user other than `root` \\\n    You need to enable public key retrieval by changing the connection settings.\n    1. Step 1: Open DBeaver and go to your MySQL connection.\n    2. Step 2: Click on **Edit Connection**.\n    3. Step 3: Go to the **Driver Properties** tab.\n    4. Step 4: Find or add the property **allowPublicKeyRetrieval** and set its value to **TRUE**.\n\n\u003c!-- \n## More more\n- Reinstall requirements.txt:\n```bash\ndocker compose down --volumes\ndocker compose build\ndocker compose up -d\n\ndocker exec -it airflow-docker-airflow-webserver-1 bash\n    python -c \"import pymongo; print('pymongo is installed successfully')\"\n``` \n--\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbdbao%2Fairflow-docker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbdbao%2Fairflow-docker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbdbao%2Fairflow-docker/lists"}