{"id":27105123,"url":"https://github.com/ahmedfarag9/airbyte-airflow-api-intergration","last_synced_at":"2026-04-14T05:33:48.584Z","repository":{"id":169527943,"uuid":"604377591","full_name":"ahmedfarag9/airbyte-airflow-api-intergration","owner":"ahmedfarag9","description":"Api integration using Airbyte/Airflow Mega Docker Cluster - Airbyte handles the Api connection \u0026 Airflow does the orchestration.","archived":false,"fork":false,"pushed_at":"2023-07-12T18:31:06.000Z","size":829,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-13T20:39:36.559Z","etag":null,"topics":["airbyte","airflow","api","aws-ec2","celery","dags","docker","docker-compose","linux","makefile","python","redis","shell-script","ssh"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahmedfarag9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-20T23:45:50.000Z","updated_at":"2024-12-29T20:20:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"de73bff6-c8e9-458f-8a42-ebe2bb51c066","html_url":"https://github.com/ahmedfarag9/airbyte-airflow-api-intergration","commit_stats":null,"previous_names":["ahmedfarag9/airbyte-airflow-api-intergration"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ahmedfarag9/airbyte-airflow-api-intergration","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfarag9%2Fairbyte-airflow-api-intergration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfarag9%2Fairbyte-airflow-api-intergration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfarag9%2Fairbyte-airflow-api-intergration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfarag9%2Fairbyte-airflow-api-intergration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahmedfarag9","download_url":"https://codeload.github.com/ahmedfarag9/airbyte-airflow-api-intergration/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfarag9%2Fairbyte-airflow-api-intergration/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31784253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airbyte","airflow","api","aws-ec2","celery","dags","docker","docker-compose","linux","makefile","python","redis","shell-script","ssh"],"created_at":"2025-04-06T18:36:41.909Z","updated_at":"2026-04-14T05:33:48.579Z","avatar_url":"https://github.com/ahmedfarag9.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# airbyte-airflow-simple-scraper \u003c!-- omit in toc --\u003e\n\n## Overview\n\nThis is a guide to setup an Airbyte Api connection that is orchestrated by Airflow.\n\n- Airflow orchestration Dag example\n\n  ![Dag Example](./images/dag_example.png)\n\n## Setup\n\n### Step 1:\n\n- Create an AWS EC2 instance as follows\n\n  - **Name : airbyte**\n  - **AWS Machine Image : Amazon Linux 2 Kernel 5.10 / 64 bit (x86)**\n  - **Instance type : t2.large 8g Memory**\n  - **Key pair (login) : Create a key pair to SSH login from your host**\n  - **Network settings : Create security group \u0026 ALLOW SSH traffic from your host ip address**\n  - **Configure storage : 15 GB gp2**\n\n### Step 2:\n\n- Edit security group inbound rules and expose the following ports\n\n  - 8000 → for Airbyte webapp\n  - 8080 → for airflow webapp\n  - 5555 → for flower-monitoring\n\n### Step 3:\n\n- SSH Log in to your AWS instance \u0026 Run the following commands to install docker \u0026 clone the github repo\n\n  ```bash\n  sudo yum -y update \u0026\u0026 \\\n  sudo yum install -y git \u0026\u0026 \\\n  sudo yum install -y docker \u0026\u0026 \\\n  sudo usermod -a -G docker $USER \u0026\u0026 \\\n  git clone https://github.com/ahmedfarag9/airbyte-airflow-simple-scraper.git \u0026\u0026 \\\n  cd airbyte-airflow-simple-scraper \u0026\u0026 \\\n  newgrp docker\n  ```\n\n- Prepare the environment\n\n  ```bash\n  bash install.sh\n  ```\n\n- Build airflow image\n\n  ```bash\n  make build-airflow-image\n  ```\n\n- Start Airflow/Airbyte stack\n\n  ```bash\n  make start-airflow-airbyte-stack\n  ```\n\n- Now your setup is ready!\n\n- The current terminal session will be used for logging stream\n\n### Step 4:\n\n- Access Airbyte at http://localhost:8000 and set up a connection\n\n  Set up a Source\n\n  ![Airbyte Source](./images/airbyte_source.png)\n\n  Set up a Destination\n\n  ![Airbyte Destination](./images/airbyte_destination.png)\n\n  Set up a Connection\n\n  ![Airbyte Connection](./images/airbyte_connection.png)\n\n  Go to the connections page and choose your connection\n  then copy connection id from page url\n\n  ![Airbyte Connection Id](./images/airbyte_connection_id.png)\n\n### Step 5:\n\n- Open a new terminal session \u0026 SSH Log in to your AWS instance \u0026 Run the following commands to set the API connection id\n\n  ```bash\n  cd airbyte-airflow-simple-scraper/ \u0026\u0026 \\\n  bash auto_connection.sh\n  ```\n\n- Then Enter the copied Airbyte connection ID when prompted\n\n### Step 6:\n\n- Access Airflow at http://localhost:8080 \u0026 enter the credentials\n\n  - USERNAME=airflow\n\n  - PASSWORD=airflow\n\n  ![Airflow Webapp](./images/airflow_webapp.png)\n\n- Activate dag to triger Airbyte to fetch Api data\n\n  ![Airflow Dag Trigger](./images/airflow_dag_trigger.png)\n\n- Then you will notice task is executed successfully in airflow task diagram\n\n  ![Airflow Task Diagram](./images/airflow_task_diagram.png)\n\n- And data sync is triggered successfully in Airbyte UI\n\n  ![Airbyte Data Sync](./images/airbyte_data_sync.png)\n\n- Finally you can Access Airflow flower for data monitoring at http://localhost:5555\n\n  ![Airflow Flower Monitoring](./images/airflow_flower_monitoring.png)\n\n---\n\n## Commands\n\n- Stop then start Airflow/Airbyte stack again\n\n  - Hit control C to stop the stack then execute the following command to start it once again\n\n    ```bash\n    make start-airflow-airbyte-stack\n    ```\n\n- Uninstall Airflow/Airbyte stack\n\n  ```bash\n  make uninstall-airflow-airbyte-stack\n  ```\n\n- Remove containers and restart stack\n\n  ```bash\n  make restart-airflow-airbyte-stack\n  ```\n\n- Purge then clean install everything\n\n  ```bash\n  make purge-then-clean-install\n  ```\n\n- Refer to the Makefile for more info\n\n---\n\n## Architecture\n\n- Airbyte --\u003e data integration engine\n\n  - UI: An easy-to-use graphical interface for interacting with the Airbyte API. (runs on port 8000)\n\n  - Server: Handles connection between UI and API. (runs on port 800)\n\n  - Scheduler: The scheduler takes work requests from the API and sends them to the Temporal service to parallelize.\n\n  - Worker: The worker connects to a source connector, pulls the data and writes it to a destination.\n\n- Airflow --\u003e workflow management platform\n\n  - Airflow consists of several components:\n\n  - Postgres Database for Metadata --\u003e Contains information about the status of tasks, DAGs, Variables, connections, etc.\n\n  - Scheduler --\u003e Reads from the Metadata database and is responsible for adding the necessary tasks to the queue\n\n  - Executor --\u003e Works closely with the Scheduler to determine what resources are needed to complete the tasks as they are queued\n\n  - Web server --\u003e HTTP Server provides access to DAG/task status information\n\n- Postgres --\u003e Metadata Database\n\n- Celery Executor --\u003e The Remote Executor to scale out the number of workers.\n\n- Redis --\u003e Used as a message broker by delivering messages to the celery workers.\n\n- Flower --\u003e Celery Monitoring Tool (runs on port 5555)\n\n\u003c!---\n\n## Resources\n\nThis repo is based on the following resources so feel free to check them for documentaion and more details\n\nhttps://github.com/nialloriordan/airbyte-airflow-scraper\n\nhttps://github.com/airbytehq/airbyte/tree/master/resources/examples/airflow\n\nhttps://docs.airbyte.com/operator-guides/using-the-airflow-airbyte-operator/\n\nhttps://airbyte.com/tutorials/data-scraping-with-airflow-and-beautiful-soup\n\n--\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedfarag9%2Fairbyte-airflow-api-intergration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmedfarag9%2Fairbyte-airflow-api-intergration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedfarag9%2Fairbyte-airflow-api-intergration/lists"}