{"id":18144795,"url":"https://github.com/alexuscr-27/amazon-data-etl","last_synced_at":"2026-02-13T06:07:17.309Z","repository":{"id":259301911,"uuid":"863883532","full_name":"ALEXUSCR-27/Amazon-Data-ETL","owner":"ALEXUSCR-27","description":"ETL pipeline for Amazon product sales data, using Apache Airflow for data orchestration and Supabase for storage, by containerizing the environment with Docker, the setup is scalable and easily deployable, supporting data-driven decision-making.","archived":false,"fork":false,"pushed_at":"2024-11-03T19:55:05.000Z","size":4713,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-03T17:52:46.379Z","etag":null,"topics":["airflow","airflow-docker","data-engineering","data-visualization","docker","postgresql","powerbi","python3","supabase"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ALEXUSCR-27.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-27T05:12:03.000Z","updated_at":"2024-11-03T19:55:08.000Z","dependencies_parsed_at":"2024-10-24T07:37:58.040Z","dependency_job_id":"ff0ed0d5-3abd-4f8d-a58d-a21d366ce9e7","html_url":"https://github.com/ALEXUSCR-27/Amazon-Data-ETL","commit_stats":null,"previous_names":["alexuscr-27/amazon-data-analysis","alexuscr-27/amazon-data-etl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ALEXUSCR-27/Amazon-Data-ETL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ALEXUSCR-27%2FAmazon-Data-ETL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ALEXUSCR-27%2FAmazon-Data-ETL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ALEXUSCR-27%2FAmazon-Data-ETL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ALEXUSCR-27%2FAmazon-Data-ETL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ALEXUSCR-27","download_url":"https://codeload.github.com/ALEXUSCR-27/Amazon-Data-ETL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ALEXUSCR-27%2FAmazon-Data-ETL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281371746,"owners_count":26489526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-docker","data-engineering","data-visualization","docker","postgresql","powerbi","python3","supabase"],"created_at":"2024-11-01T20:06:11.913Z","updated_at":"2025-10-28T01:32:42.414Z","avatar_url":"https://github.com/ALEXUSCR-27.png","language":"Python","readme":"\u003ca id=\"readme-top\"\u003e\u003c/a\u003e\n# Amazon Products Data ETL\n\n\u003cdiv align=\"center\"\u003e\n  \n  ![Static Badge](https://img.shields.io/badge/v3.12.3-blue?label=Python)\n  ![Static Badge](https://img.shields.io/badge/v2.10.2-red?label=Apache-Airflow)\n  ![Static Badge](https://img.shields.io/badge/v2.2.3-purple?label=Pandas)\n  ![Static Badge](https://img.shields.io/badge/v27.2.0-darkblue?label=Docker)\n  ![Static Badge](https://img.shields.io/badge/Google%20Looker%20Studio-white)\n\n\n\u003c/div\u003e\n\n## About the project\nThis project focuses on building an ETL (Extract, Transform, Load) pipeline to process and analyze Amazon products data from a CSV file as a source. Using Apache Airflow for orchestration and automation of the operations, and Supabase for database management. This project efficiently organizes data for Business Intelligence (BI) and analysis, for this we use Google Looker Studio to generate visualizations and charts based on the processed data with some attributes like `Product name`, `Categories`, `Subcategories`, `Prices`, `Ratings` and `Reviews`.\n\n## Features\n- Dataset: The dataset can be fount in the following link [here](https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset), it is also included in the data/raw folder of the repository.\n- ETL Pipeline Automation: Orchestrate data extraction, transformation and loading processes using Apache Airflow, containerized with Docker to simplify deployment and ensure consistency across environments.\n- Data Cleansing and Transformation: Processes raw CSV data into a structured format, cleaning and preparing different attributes like `Prices`, `Discount percentage`, `Ratings`, `Ratings count`, `Category` and creating new columns like `Sub-categories`.\n- Database Management: Manages storage and retrieval of transformed data, optimizing it for Business Intelligence (BI) using Supabase services with PostgreSQL as the database.\n- Business Intelligence Visualizations: Generates dynamic charts and reports in Google Looker Studio for deeper insights, including visualizations of popular product categories, rating trends, pricing distributions, and discount analytics.\n\n  \u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"images/Amazon_products_report-3.png\" alt=\"amazon_products_report3\" width=\"auto\" height=\"auto\"/\u003e\n  \u003c/div\u003e\n\n## Getting started\n\n- Clone the repository\n  \n  ```sh\n  git clone https://github.com/ALEXUSCR-27/Amazon-Data-ETL.git\n  cd Amazon-Data-ETL\n  ```\n- Config your airflow credentials in `airflow-init` configuration option in the `docker-compose` file\n  ```\n  _AIRFLOW_WWW_USER_USERNAME: your_username\n  _AIRFLOW_WWW_USER_PASSWORD: your_password\n  ```\n- Build the docker container\n\n  ```\n  docker run -p 8080:8080 Dockerfile\n  docker-compose up\n  ```\n- Access to airflow\n  \n    Open browser and go to `http://localhost:8080` and sign in with your credentials.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexuscr-27%2Famazon-data-etl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexuscr-27%2Famazon-data-etl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexuscr-27%2Famazon-data-etl/lists"}