{"id":27261467,"url":"https://github.com/stevehoober254/dataengineer-portfolio","last_synced_at":"2026-04-18T14:05:09.082Z","repository":{"id":287213420,"uuid":"963971890","full_name":"stevehoober254/dataengineer-portfolio","owner":"stevehoober254","description":"📊 End-to-end ETL pipelines, Airflow DAGs, notebook-driven analytics \u0026 data warehousing","archived":false,"fork":false,"pushed_at":"2025-04-10T19:55:02.000Z","size":7,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-09T02:32:44.882Z","etag":null,"topics":["airflow","analytics","big-data","dagster","data-engineering","data-lake","data-pipelines","etl","python","spark"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevehoober254.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["stevehoober254"]}},"created_at":"2025-04-10T13:48:31.000Z","updated_at":"2025-04-10T19:55:05.000Z","dependencies_parsed_at":"2025-04-10T15:51:54.042Z","dependency_job_id":"5b746270-49a7-479c-a50a-f47580bc327a","html_url":"https://github.com/stevehoober254/dataengineer-portfolio","commit_stats":null,"previous_names":["stevehoober254/dataengineer-portfolio"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stevehoober254/dataengineer-portfolio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevehoober254%2Fdataengineer-portfolio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevehoober254%2Fdataengineer-portfolio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevehoober254%2Fdataengineer-portfolio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevehoober254%2Fdataengineer-portfolio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevehoober254","download_url":"https://codeload.github.com/stevehoober254/dataengineer-portfolio/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevehoober254%2Fdataengineer-portfolio/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31971500,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","analytics","big-data","dagster","data-engineering","data-lake","data-pipelines","etl","python","spark"],"created_at":"2025-04-11T05:32:36.446Z","updated_at":"2026-04-18T14:05:04.073Z","avatar_url":"https://github.com/stevehoober254.png","language":null,"funding_links":["https://github.com/sponsors/stevehoober254"],"categories":[],"sub_categories":[],"readme":"# 📊 Data Engineer Portfolio\n\nA practical portfolio of data engineering pipelines, orchestrated DAGs, and analytics notebooks. These projects demonstrate end-to-end ETL processes, real-time ingestion, data lake design, and Python-based transformations.\n\n## 📌 Highlights\n- Apache Airflow DAG orchestration\n- Batch and streaming ETL pipelines\n- Python \u0026 Pandas-based data wrangling\n- Data validation and unit testing\n- Jupyter notebooks with visual insights\n\n## Project List\n\n## 1. Smart Grid IoT Data Pipeline\n\n### Problem\nPower companies in emerging markets struggle to track real-time grid performance.\n\n### Solution\nBuild an end-to-end pipeline that:\n- Ingests data from simulated smart meters via **AWS Kinesis**\n- Transforms with **AWS Glue** + **Apache Hudi**\n- Loads into **Redshift**\n- Visualized in **Amazon QuickSight**\n\n### Goals\n- Stream real-time energy usage\n- Aggregate usage by time, region, household\n- Detect anomalies and outages\n\n---\n\n## 2. Kenya Open Data Explorer\n\n### Problem\nGovernment data is available but not easily analyzable for citizens or journalists.\n\n### Solution\nCreate a public analytics dashboard:\n- ETL pipelines in **Apache Airflow**\n- Cleaned datasets in **BigQuery**\n- Visualizations in **Metabase**\n- Public search and filter frontend using **Next.js**\n\n### Goals\n- Process and publish monthly updated datasets\n- Make visual data stories (health, education, environment)\n- Enable CSV downloads and API access\n\n---\n\n## 3. Political Sentiment Analysis Pipeline\n\n### Problem\nElection stakeholders need real-time sentiment insights from social media.\n\n### Solution\nStream political tweets and comments:\n- **Kafka** or **Kinesis Firehose** for ingestion\n- **Spark Structured Streaming** for processing\n- **S3** + **PrestoDB** for storage and querying\n- Dashboard built with **Apache Superset**\n\n### Goals\n- Classify sentiments: positive, neutral, negative\n- Track by politician, region, or hashtag\n- Show trending concerns or hate speech\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevehoober254%2Fdataengineer-portfolio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevehoober254%2Fdataengineer-portfolio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevehoober254%2Fdataengineer-portfolio/lists"}