{"id":22502613,"url":"https://github.com/jleung51/foundations-dags","last_synced_at":"2025-10-04T13:51:04.266Z","repository":{"id":94897550,"uuid":"436916499","full_name":"jleung51/foundations-dags","owner":"jleung51","description":"Data ETL pipeline to clean, process, and aggregate data from Canadian housing starts.","archived":false,"fork":false,"pushed_at":"2023-01-25T07:14:47.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-08T16:02:31.157Z","etag":null,"topics":["data","data-engineering","etl","extract","housing","load","pipeline","transform"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jleung51.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-10T09:09:56.000Z","updated_at":"2023-01-25T07:14:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"f96cfdb0-1d20-4005-afc3-b6879e5f9f8d","html_url":"https://github.com/jleung51/foundations-dags","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jleung51/foundations-dags","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jleung51%2Ffoundations-dags","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jleung51%2Ffoundations-dags/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jleung51%2Ffoundations-dags/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jleung51%2Ffoundations-dags/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jleung51","download_url":"https://codeload.github.com/jleung51/foundations-dags/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jleung51%2Ffoundations-dags/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278322147,"owners_count":25967873,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-engineering","etl","extract","housing","load","pipeline","transform"],"created_at":"2024-12-06T23:20:01.833Z","updated_at":"2025-10-04T13:51:04.238Z","avatar_url":"https://github.com/jleung51.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Foundations: Data Pipeline\n\nData ETL pipeline to clean, process, and aggregate data from Canadian housing starts.\n\nBuilt with [Apache Airflow](https://airflow.apache.org/), [dbt](https://www.getdbt.com/), and [Amazon Web Services EC2](https://aws.amazon.com/ec2/).\n\nLearn more about the project by reading the [design document](https://docs.google.com/document/d/1zan6-rcnNHz4wdBt0fvPxRnJLioCjlaFQUfi1_0EU04/edit).\n\n---\n\n## Table of Contents\n\n- [Foundations: Data Pipeline](#foundations-data-pipeline)\n  - [Table of Contents](#table-of-contents)\n  - [Setup Instructions](#setup-instructions)\n    - [Set up AWS EC2 (Host for the Database and Orchestrator)](#set-up-aws-ec2-host-for-the-database-and-orchestrator)\n      - [Provision an EC2 Instance](#provision-an-ec2-instance)\n      - [Connect to the EC2 Instance using AWS CloudShell](#connect-to-the-ec2-instance-using-aws-cloudshell)\n    - [Set up Docker (Containerizer)](#set-up-docker-containerizer)\n    - [Set up Airflow (Orchestrator)](#set-up-airflow-orchestrator)\n      - [Setup](#setup)\n      - [Expose Airflow Console with the EC2 Port](#expose-airflow-console-with-the-ec2-port)\n  - [Usage](#usage)\n    - [Start Airflow](#start-airflow)\n    - [Interact with Airflow (Local)](#interact-with-airflow-local)\n\n\u003csub\u003eTable of contents created with [VS Code Extension: Markdown All in One](https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one).\n\u003c/sub\u003e\n\n___\n\n\n## Setup Instructions\n\n### Set up AWS EC2 (Host for the Database and Orchestrator)\n\n#### Provision an EC2 Instance\n\n1. Navigate to [Amazon Web Services](https://aws.amazon.com/) and create an account.\n1. In the search bar at the top, search for and click **EC2**.\n1. Click the **Launch Instance** button and follow the instructions. For this example, we will be using the Amazon Linux AMI operating system.\n1. Download the `.pem` key file and keep it secure.\n1. Wait until the instance state is **Running**.\n\n#### Connect to the EC2 Instance using AWS CloudShell\n\n1. In the AWS EC2 service, in the left sidebar, select **Instances**.\n1. Select the newly created instance.\n1. At the top of the screen, click **Connect**.\n1. Choose any of the provided options to connect to the instance.\n\nAs an alternative to CloudShell, you can also use SSH from your local computer.\n\n\n### Set up Docker (Containerizer)\n\nFollow the instructions at:\n1. [Installing Docker to use with the AWS SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-docker.html)\n2. [Install Docker Engine](https://docs.docker.com/engine/install/)\n3. [Install Docker Compose](https://docs.docker.com/compose/install/linux/#install-using-the-repository)\n\nEnsure your user has the permissions to execute Docker:\n```shell\nsudo groupadd docker\nsudo usermod -aG docker $USER\n```\n\nLog out and back in to get permissions.\n\n\n\n### Set up Airflow (Orchestrator)\n\nThis section is based on the [guide for running Airflow using Docker Compose](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html).\n\n#### Setup\n\nIf you are on Linux, update the Airflow UID in the `.env` file with the host user ID:\n```shell\necho -e \"AIRFLOW_UID=$(id -u)\" \u003e .env\n```\n\nRun database migrations and initialize the first user account:\n```shell\ndocker compose up airflow-init\n```\n\n#### Expose Airflow Console with the EC2 Port\n\nFollow the commands under [these instructions](https://aws.amazon.com/premiumsupport/knowledge-center/connect-http-https-ec2/) to add security group rules which permit HTTP access to port 80.\n\nOnce complete, the security rules should look like this:\n\n![Security group rules](readme-img/aws-security-rules.png)\n\n---\n\n## Usage\n\n### Start Airflow\n\nStart all services:\n```shell\nsudo docker compose up\n```\n\n`sudo` is required to run the Airflow console on port 80.  \nIf you want to avoid `sudo` or prefer another port:\n- Open `docker-compose.yaml`\n- Find the configuration for `airflow-webserver`\n- Change the port number in the variable `ports`\n\n\nYou can stop all services with:\n```shell\ndocker compose down\n```\n\nAirflow is now running on your machine.\n\n### Interact with Airflow (Local)\n\nIf you set up Airflow and Docker locally, you can log into Airflow at http://localhost:80; otherwise use the port you used to expose the Airflow console.\n\nThe default username and password `airflow`; reset it immediately after logging in.\n\nYou can view information on the current environment:\n```shell\ndocker compose run airflow-worker airflow info\nOR\n./bin/airflow info\n```\n\nEnter the running Docker container to execute commands:\n```shell\n./bin/airflow bash\n```\n\nStop and delete all containers and volumes:\n```shell\ndocker-compose down --volumes --rmi all\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjleung51%2Ffoundations-dags","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjleung51%2Ffoundations-dags","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjleung51%2Ffoundations-dags/lists"}