{"id":24021232,"url":"https://github.com/zipcodecore/dataengineering.labs.airflowproject","last_synced_at":"2026-03-05T09:01:39.436Z","repository":{"id":147837573,"uuid":"252765093","full_name":"ZipCodeCore/DataEngineering.Labs.AirflowProject","owner":"ZipCodeCore","description":null,"archived":false,"fork":false,"pushed_at":"2022-11-23T01:46:33.000Z","size":3,"stargazers_count":2,"open_issues_count":1,"forks_count":19,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-25T23:47:15.379Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZipCodeCore.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-03T15:04:27.000Z","updated_at":"2024-08-12T22:07:46.000Z","dependencies_parsed_at":"2023-06-11T10:15:38.482Z","dependency_job_id":null,"html_url":"https://github.com/ZipCodeCore/DataEngineering.Labs.AirflowProject","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ZipCodeCore/DataEngineering.Labs.AirflowProject","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FDataEngineering.Labs.AirflowProject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FDataEngineering.Labs.AirflowProject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FDataEngineering.Labs.AirflowProject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FDataEngineering.Labs.AirflowProject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZipCodeCore","download_url":"https://codeload.github.com/ZipCodeCore/DataEngineering.Labs.AirflowProject/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FDataEngineering.Labs.AirflowProject/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30117478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T08:19:04.902Z","status":"ssl_error","status_checked_at":"2026-03-05T08:17:37.148Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-08T12:40:09.618Z","updated_at":"2026-03-05T09:01:39.289Z","avatar_url":"https://github.com/ZipCodeCore.png","language":null,"readme":"# Airflow Project\n\ningest, clean, process and report.\n\nData Sources that can be used:\n\n- Tennis tournament data\n- Spotify API on music\n- Housing data from a few sources (gov data mixed with MLS)\n- Personal history of Amazon purchases\n- Wearable data\n- Fantasy football\n- a dataset from your assion project domain\n\n## Point of the Project\n\nTo create an airflow pipeline to ingest data from some source and go through some type of data wrangling of their choice \n\n- ingest from a data source(s)\n- clean (add missing data, create a feature or two)\n- process and perform some transformations\n- report (charts, graphs, some kind of output which informs our understanding of the datasets)\n\nYou will need to\n\n- decide on a data source; from CSV file to online API\n- decide on how to transform the data in some way to clean it using Pandas\n- decide on a simple data visualization using your choice of data viz package\n- do it all in a airfow DAG\n\n## Example\n\nMaybe you like this Kaggle DS: https://www.kaggle.com/deepcontractor/seasonal-variation-in-births\n\nAnd you'd like to see if births in Australia during the summer is different from births in France during the summer? (or perhaps other countries)\n\nYou Airflow DAG might - Load the CSV, filter it, format it, account for differences in populations (births per 100,000 people?)\nand generally clean any missing data. Take note of the years, perhaps consolidate by decade?\nAnd then chart something interesting about the data.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzipcodecore%2Fdataengineering.labs.airflowproject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzipcodecore%2Fdataengineering.labs.airflowproject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzipcodecore%2Fdataengineering.labs.airflowproject/lists"}