{"id":29128188,"url":"https://github.com/willie-conway/ibm-data-engineering-capstone-project","last_synced_at":"2026-04-11T06:02:39.856Z","repository":{"id":300656450,"uuid":"1002121519","full_name":"Willie-Conway/IBM-Data-Engineering-Capstone-Project","owner":"Willie-Conway","description":"End-to-end Data Engineering Capstone Project using MySQL, 🍃MongoDB, 🐘PostgreSQL, 💨Apache Airflow, ⚡️Apache Spark, and BI dashboards 📊🚀","archived":false,"fork":false,"pushed_at":"2025-06-26T06:17:39.000Z","size":26583,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-30T01:42:40.854Z","etag":null,"topics":["airflow","capstone-project","dashboards","data-engineering","db2-warehouse","etl","ibm","ibm-cognos-analytics","mongodb","mysql","piplines","postgresql","spark","sql","sqlite3"],"latest_commit_sha":null,"homepage":"https://developers.google.com/profile/u/109845255803256255656","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Willie-Conway.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-14T18:59:49.000Z","updated_at":"2025-06-26T06:17:42.000Z","dependencies_parsed_at":"2025-06-23T01:33:36.549Z","dependency_job_id":"a8cf0b7e-7356-4bbb-88ae-70eb2609a6d1","html_url":"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project","commit_stats":null,"previous_names":["willie-conway/ibm-data-engineering-capstone","willie-conway/ibm-data-engineering-capstone-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Willie-Conway/IBM-Data-Engineering-Capstone-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FIBM-Data-Engineering-Capstone-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FIBM-Data-Engineering-Capstone-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FIBM-Data-Engineering-Capstone-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FIBM-Data-Engineering-Capstone-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Willie-Conway","download_url":"https://codeload.github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FIBM-Data-Engineering-Capstone-Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31670383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T17:19:37.612Z","status":"online","status_checked_at":"2026-04-11T02:00:05.776Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","capstone-project","dashboards","data-engineering","db2-warehouse","etl","ibm","ibm-cognos-analytics","mongodb","mysql","piplines","postgresql","spark","sql","sqlite3"],"created_at":"2025-06-30T01:37:44.285Z","updated_at":"2026-04-11T06:02:39.836Z","avatar_url":"https://github.com/Willie-Conway.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🏗️ IBM Data Engineering Capstone Project\n\n\u003cp float=\"left\"\u003e\n    \u003cimg src=\"https://images.credly.com/size/340x340/images/9ba24fc4-6d91-4840-91f9-5f03b3e32ac1/image.png\" width=\"300\" /\u003e\n    \u003cimg src=\"https://i.postimg.cc/4NLdmdMs/IBM-Cognos-Analytics.jpg\" width=\"300\" /\u003e\n\u003c/p\u003e\n\nThis capstone project showcases the practical application of key data engineering skills by simulating a real-world scenario in which I served as a Junior Data Engineer. I designed and implemented a scalable data analytics platform by working across various technologies in the data engineering lifecycle.\n\n\n---\n\n## 🚀 Project Overview\n\nThis capstone project simulates the role of a **Junior Data Engineer** tasked with designing and implementing an end-to-end **data analytics platform** using multiple data engineering tools and technologies.  \nIt’s the final course in the [IBM Data Engineering Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-engineer), combining all prior learning into one practical project.\n\n\u003cp float=\"left\"\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-1.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-2.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-3.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-4.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-5.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-6.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Simple_Dashboard-7.jpg\" width=\"300\" /\u003e\n\u003c/p\u003e\n\n\u003cp float=\"left\"\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Loyalty_%26_Sales_Performance_Dashboard.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Community_Property_Revenue_%26_Loyalty_Sales_Dashboard-1.jpg\" width=\"300\" /\u003e\n    \u003cimg src=\"https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/8d4a22f5ea8c63e9393b3d86f024c4f72dfb03e2/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/Community_Property_Revenue_%26_Loyalty_Sales_Dashboard-2.jpg\" width=\"300\" /\u003e\n\u003c/p\u003e\n\n---\n\n## 🧠 What I Learned\n\n✅ Design and build data platforms using OLTP \u0026 OLAP architectures  \n✅ Implement data pipelines with ETL processes using Python and Apache Airflow  \n✅ Query structured and unstructured data using MySQL, PostgreSQL, and MongoDB  \n✅ Perform big data analytics and ML predictions using Apache Spark  \n✅ Visualize insights via dashboards in Google Looker Studio and IBM Cognos Analytics\n\n---\n\n## 🧰 Skills \u0026 Tools Used\n\n- 🐍 Python \u0026 SQL\n- 🐘 PostgreSQL | 🐬 MySQL | 🍃 MongoDB\n- 🛠️ Apache Airflow\n- 🔍 Apache Spark (MLlib)\n- 📊 IBM Cognos Analytics | Google Looker Studio\n- 🗃️ OLTP \u0026 Data Warehousing\n- 🧱 ETL \u0026 Data Pipelines\n- 🐧 Linux Shell Scripting\n- 📂 JSON, CSV, .tar.gz, and data transformations\n\n---\n\n## 📦 Modules Breakdown\n\n| Module | Description |\n|--------|-------------|\n| 📁 **1. Data Platform Architecture \u0026 OLTP** | Designed OLTP schemas \u0026 created MySQL databases |\n| 🍃 **2. NoSQL with MongoDB** | Queried JSON documents and used MongoDB indexes |\n| 🗄️ **3. Data Warehouse** | Built dimensional models \u0026 populated warehouse tables |\n| 📈 **4. Data Analytics \u0026 Reporting** | Wrote complex SQL queries with `ROLLUP`, `CUBE`, and aggregations |\n| 🔁 **5. ETL \u0026 Pipelines** | Built ETL flows with Python scripts and Apache Airflow DAGs |\n| ⚡ **6. Big Data Analytics with Spark** | Trained and deployed ML models using Spark MLlib |\n| ✅ **7. Final Submission** | Delivered final reports, dashboards, and peer-reviewed projects |\n\n---\n\n## 📊 Dashboard Samples\n\n| Tool | Preview |\n|------|---------|\n| Google Looker Studio | ![Looker Dashboard](https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone-Project/blob/a62d4dabe48342884ce4b6d77f8a95c8326ae09a/Data%20Engineering%20Capstone%20Project/CheatSheet/Images/E-Commerce_Sales_Dashboard_(2020).jpg) |\n| IBM Cognos Analytics | ![Cognos Dashboard](https://github.com/Willie-Conway/IBM-Data-Engineering-Capstone/blob/c8d782d38e24a2ba26c01faf93f743b635cf07a7/Data%20Engineering%20Capstone%20Project/Labs/Dashboard%20Creation%20using%20IBM%20Cognos%20Analytics/Screenshots/E-commerce%20Sales%20Dashboard.jpg) |\n\n---\n\n## 📂 Project Assets\n\n```\n\n📁 OLTP Database Design\n📁 NoSQL Queries \u0026 Exports\n📁 Data Warehouse Scripts \u0026 CSVs\n📁 Airflow DAGs \u0026 Python Scripts\n📁 SparkML Model \u0026 Predictions\n📁 Dashboards (Google Looker, Cognos)\n\n```\n\n## 📌 Key Skills Demonstrated\n\n- 🗃️ Relational \u0026 NoSQL Database Design (MySQL, MongoDB)\n- 🏗️ Data Warehouse Modeling and Querying (PostgreSQL, IBM Db2)\n- 🔄 ETL Pipeline Development (Python, Shell, Apache Airflow)\n- 🔥 Big Data Analytics with Apache Spark\n- 📊 Data Visualization (Google Looker Studio, IBM Cognos Analytics)\n- 🐧 Linux Shell Scripting\n- 🧪 SQL queries using `ROLLUP`, `CUBE`, `GROUPING SETS`, and Materialized Query Tables (MQTs)\n\n---\n\n## 🧪 Capstone Modules \u0026 Labs Overview\n\n### 📁 Module 1: Data Platform Architecture \u0026 OLTP\n- Designed an OLTP schema and created MySQL tables.\n- Imported and exported data using SQL and shell scripts.\n- Defined primary keys and indexes for optimized access.\n\n### 🍃 Module 2: Querying Data in NoSQL (MongoDB)\n- Loaded product catalog data into MongoDB.\n- Performed filter queries and aggregation pipelines.\n- Exported collections using `mongoexport`.\n\n### 🏗️ Module 3: Building a Data Warehouse\n- Created star schema with dimensions and fact tables in PostgreSQL.\n- Imported e-commerce sales data.\n- Performed OLAP queries with `CUBE`, `ROLLUP`, and `GROUPING SETS`.\n\n### 📈 Module 4: Data Analytics\n- Wrote analytical SQL queries to uncover trends in sales data.\n- Used Materialized Query Tables to improve performance.\n\n### 🔁 Module 5: ETL \u0026 Data Pipelines\n- Wrote Python scripts for extract, transform, and load processes.\n- Automated the pipeline using Apache Airflow DAGs.\n- Processed and cleaned web logs into structured format.\n\n### ⚡ Module 6: Big Data Analytics with Apache Spark\n- Used Spark to load and transform product review data.\n- Built a machine learning model using Spark MLlib.\n- Saved and reloaded the trained model for prediction tasks.\n\n### 📊 Module 7: Dashboards \u0026 Final Submission\n- Built sales dashboards using:\n  - **Google Looker Studio**: Interactive charts, filters, KPIs.\n  - **IBM Cognos Analytics**: Custom visualizations and report generation.\n- Submitted final project artifacts for peer review.\n\n---\n\n## 🧠 Summary\n\nThis project helped solidify my knowledge of:\n- Building data infrastructure from ground up\n- Managing both structured and semi-structured data\n- Automating and scaling data workflows\n- Communicating data insights through visual tools\n\n---\n\n\n## 🏁 Outcome\n\n✅ **Proficiency in end-to-end data engineering workflows**  \n✅ **Prepared for real-world junior-level data engineering roles**\n\n---\n\n## 🧠 Reflections\n\nThis project was a culmination of weeks of learning and hands-on practice. I strengthened my data engineering foundations and became confident in building real-world data solutions end-to-end. 🧩💡\n\n---\n\n## 💼 Ideal For\n\n- Hiring managers evaluating full-stack data engineers  \n- Recruiters seeking professionals skilled in data architecture, pipelines, and analytics  \n- Anyone interested in practical data engineering workflows\n\n---\n\n## 🔗 Looker Dashboards\n\n- [Loyalty \u0026 Sales Performance Dashboard](https://lookerstudio.google.com/s/igUfnRY4S6M)\n- [E-Commerce_Sales_Dashboard_(2020)](https://lookerstudio.google.com/s/nBv_zBtawc4)\n- [Simple_Dashboard](https://lookerstudio.google.com/s/lz8wjFuj5Nw)\n- [Community_Property_Revenue_\u0026_Loyalty_Sales_Dashboard](https://lookerstudio.google.com/s/ielt85KR3Sw)\n- [Sales \u0026 Service Dashboard](https://lookerstudio.google.com/s/mMf7a_kwdAo)\n\n---\n\n## 🏁 Let's Connect!\n\nIf you're interested in my other data projects or collaborations:  \n🌐 [My Portfolio](#) | 💼 [LinkedIn](#) | 📂 [GitHub Projects](#)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillie-conway%2Fibm-data-engineering-capstone-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillie-conway%2Fibm-data-engineering-capstone-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillie-conway%2Fibm-data-engineering-capstone-project/lists"}