{"id":15208139,"url":"https://github.com/mfurmanczyk/wh-sales","last_synced_at":"2026-01-24T12:48:02.076Z","repository":{"id":252191873,"uuid":"832811255","full_name":"MFurmanczyk/wh-sales","owner":"MFurmanczyk","description":"E-commerce analytics data warehouse ETL made with Apache Spark.","archived":false,"fork":false,"pushed_at":"2024-08-08T06:56:34.000Z","size":228,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-02T01:11:22.305Z","etag":null,"topics":["airflow","data","data-engineering","data-warehouse","kotlin","python","spark"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MFurmanczyk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-23T19:13:34.000Z","updated_at":"2024-08-19T14:19:25.000Z","dependencies_parsed_at":"2024-08-08T08:34:30.946Z","dependency_job_id":"711a145f-e78c-4b15-b415-611b847cff1f","html_url":"https://github.com/MFurmanczyk/wh-sales","commit_stats":null,"previous_names":["mfurmanczyk/wh-sales"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MFurmanczyk%2Fwh-sales","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MFurmanczyk%2Fwh-sales/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MFurmanczyk%2Fwh-sales/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MFurmanczyk%2Fwh-sales/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MFurmanczyk","download_url":"https://codeload.github.com/MFurmanczyk/wh-sales/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238817431,"owners_count":19535526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","data","data-engineering","data-warehouse","kotlin","python","spark"],"created_at":"2024-09-28T07:01:02.756Z","updated_at":"2025-10-29T11:31:27.236Z","avatar_url":"https://github.com/MFurmanczyk.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# E-Commerce Data Warehouse ETL\n\nThis project implements an ETL (Extract, Transform, Load) process to migrate and transform data from an OLTP (Online Transaction Processing) system to a star schema in a data warehouse. The ETL process is written in Kotlin and Spark, and it is orchestrated using Apache Airflow.\n\n## Features\n\n- **ETL Processing of MySQL tables**:\n    - **T_CATEGORY**: product categories.\n    - **T_CUSTOMER**: customer data and their addresses.\n    - **T_ORDER**: information about orders.\n    - **T_ORDER_REL**: information about products in orders.\n    - **T_PRODUCT**: Prosses product information.\n    - **T_PROMO and T_PROMO_REL**: information about promotions and the products affected by them.\n\n- **Data Transformation**:\n    - Transforms data from OLTP format to a star schema suitable for analytical queries.\n\n- **Orchestration**:\n    - Utilizes Apache Airflow for scheduling and managing the ETL workflows.\n\n## Technologies Used\n\n- **Programming Language**: Kotlin\n- **Data Processing Framework**: Apache Spark v3.3.2\n- **Workflow Orchestration**: Apache Airflow\n\n\n\n\n## Getting Started\n\n### Prerequisites\n\n- **Kotlin**: Ensure you have Kotlin installed. [Install Kotlin](https://kotlinlang.org/docs/tutorials/command-line.html).\n- **Apache Spark v3.3.2**: Ensure you have Apache Spark installed. [Install Spark](https://spark.apache.org/downloads.html).\n- **Apache Airflow**: Ensure you have Apache Airflow installed. [Install Airflow](https://airflow.apache.org/docs/apache-airflow/stable/start.html).\n\n### Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/MFurmanczyk/wh-sales.git\ncd wh-sales\n```\n\n2. Build the project:\n```bash\n  ./gradlew shadowJar\n```\n\n3. Setup Apache Airflow:\n```bash \ncd airflowdocker-compose up\n```\n\n4. Running the ETL\n   Start the Apache Airflow web server and scheduler:\n```bash\ndocker-compose up -d\n```\n\n5. Move `dag.py` to Airflow's DAGs folder.\n\nAccess the Airflow UI at http://localhost:8080 and trigger the ETL DAG (sales_dag).\n\n## License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfurmanczyk%2Fwh-sales","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmfurmanczyk%2Fwh-sales","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfurmanczyk%2Fwh-sales/lists"}