{"id":29652046,"url":"https://github.com/aymanibrahim/ecommerce","last_synced_at":"2026-04-07T08:02:09.841Z","repository":{"id":304292851,"uuid":"1018352353","full_name":"aymanibrahim/ecommerce","owner":"aymanibrahim","description":"Data engineering project simulating an e-commerce analytics platform","archived":false,"fork":false,"pushed_at":"2025-07-12T05:41:37.000Z","size":1397,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-12T04:09:22.140Z","etag":null,"topics":["airflow","analytics","automation","dashboard","data-engineering","data-pipeline","data-warehouse","database","e-commerce","etl","mongodb","mysql","nosql","oltp","postgresql","spark","tableau"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aymanibrahim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-12T04:51:55.000Z","updated_at":"2025-07-24T03:03:56.000Z","dependencies_parsed_at":"2025-07-12T07:19:33.101Z","dependency_job_id":"3a832538-48ff-47d0-a4d6-9338a8a1ed38","html_url":"https://github.com/aymanibrahim/ecommerce","commit_stats":null,"previous_names":["aymanibrahim/ecommerce"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aymanibrahim/ecommerce","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aymanibrahim%2Fecommerce","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aymanibrahim%2Fecommerce/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aymanibrahim%2Fecommerce/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aymanibrahim%2Fecommerce/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aymanibrahim","download_url":"https://codeload.github.com/aymanibrahim/ecommerce/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aymanibrahim%2Fecommerce/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31504897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","analytics","automation","dashboard","data-engineering","data-pipeline","data-warehouse","database","e-commerce","etl","mongodb","mysql","nosql","oltp","postgresql","spark","tableau"],"created_at":"2025-07-22T06:01:12.852Z","updated_at":"2026-04-07T08:02:09.825Z","avatar_url":"https://github.com/aymanibrahim.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🛒 E-commerce Data Analytics Platform\n\n![E-commerce Platform](01_oltp/images/ecommerce.png)\n\nA data engineering project simulating an e-commerce analytics platform with end-to-end integration of OLTP, NoSQL, data warehousing, ETL pipelines, big data analytics, and BI dashboards.\n\n\n\n## 🚀 **Project Overview**\n\nThis project demonstrates the design and implementation of a modern data platform for an e-commerce company whose online presence is driven entirely by:\n\n* **Sales transactional data** stored in **MySQL**\n* **Product catalog data** stored in **MongoDB**\n\nTo enable analytics and business intelligence:\n\n* Data is periodically extracted from these databases into a **staging data warehouse**\n* **ETL pipelines** orchestrated by **Apache Airflow** extract, transform and load the data\n* **Apache Spark** is used for big data analytics and sales forecasting\n* **Tableau dashboards** provide business insights for BI teams\n\n\n## 💼 **Business Challenge**\n\nDesign and implement a robust data platform to integrate and analyze e-commerce data from multiple sources for operational reporting, business intelligence, and machine learning use cases.\n\n\n\n## 🎯 **Project Objectives**\n\n1. Design data repositories using **MySQL (OLTP)** and **MongoDB (NoSQL)** for transactional and catalog data\n2. Build a **PostgreSQL data warehouse**, create fact and dimension tables, and perform **cube and rollup operations**\n3. Develop **Tableau dashboards** to visualize key business metrics\n4. Create **ETL pipelines with Apache Airflow** to extract, transform, and load data into the warehouse\n5. Perform **big data analytics using Apache Spark**, deploying a machine learning model for sales forecasting\n\n\n\n## 🗂️ **Project Phases**\n\n### 1. [Setup OLTP Database (MySQL)](01_oltp/01_oltp.md)\n\n![Sample OLTP Data](01_oltp/images/sampledata.png)\n\n* Design and populate the OLTP schema for sales data\n* Automate periodic data exports\n\n\n\n### 2. [Setup NoSQL Database (MongoDB)](02_nosql/02_nosql.md)\n\n![MongoDB Documents](02_nosql/images/first_documents.png)\n\n* Load e-commerce catalog data\n* Query and manage product information in MongoDB\n\n\n\n### 3. [Build Data Warehouse (PostgreSQL)](03_dwh/03_dwh.md)\n\n![Data Warehouse ERD](03_dwh/images/erd.png)\n\n* Design and implement the data warehouse schema\n* Create fact and dimension tables for analytical queries\n\n\n\n### 4. [Create Business Intelligence Dashboard (Tableau)](04_analytics/04_analytics.md)\n\n![Summary Dashboard](04_analytics/images/04_2_dashboard/05_Summary.png)\n\n* Load data into the data warehouse\n* Build cubes and rollups\n* Design dashboards to analyze sales performance across time, categories, and geographies\n\n\n### 5. [Create ETL Pipelines (Apache Airflow)](05_etl/05_etl.md)\n\n![Airflow DAG Graph View](05_etl/images/05_2_pipelines_airflow/07_graph_view.png)\n\n* Extract e-commerce web server log\n* Transform data to exclude specific IP Address\n* Load transformed data into tar file\n* Automate incremental data loads using Airflow DAGs\n\n\n### 6. [Perform Big Data Analytics (Apache Spark)](06_spark/06_spark.md)\n\n![Top Search Terms](06_spark/images/05_top5terms.png)\n\n* Analyze e-commerce search terms using Spark\n* Deploy pretrained **sales forecasting models with SparkML**\n* Predict future sales trends for business planning\n\n\n## 🛠️ **Tools \u0026 Technologies**\n\n| Purpose               | Tool           |\n| --------------------- | -------------- |\n| OLTP database         | MySQL          |\n| NoSQL database        | MongoDB        |\n| Data warehouse        | PostgreSQL     |\n| Data pipelines        | Apache Airflow |\n| Big data analytics    | Apache Spark   |\n| Business intelligence | Tableau        |\n\n---\n\n## 📊 **Data**\n\nThe datasets used in this project are **synthetic** and were programmatically generated as part of the [IBM Data Engineering Capstone Project](https://www.coursera.org/learn/data-enginering-capstone-project) within the [IBM Data Engineering Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-engineer) on Coursera.\n\n\n## 📎 **Repository Structure**\n\n```\n.\n├── 01_oltp/        # MySQL OLTP setup\n├── 02_nosql/       # MongoDB NoSQL setup\n├── 03_dwh/         # PostgreSQL Data Warehouse\n├── 04_analytics/   # Tableau Dashboards\n├── 05_etl/         # Apache Airflow ETL pipelines\n├── 06_spark/       # Apache Spark big data analytics\n└── README.md       # Project README file\n```\n\n### ⭐ **If you find this project helpful, please star the repository to support its visibility.**","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faymanibrahim%2Fecommerce","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faymanibrahim%2Fecommerce","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faymanibrahim%2Fecommerce/lists"}