{"id":23845772,"url":"https://github.com/fabioba/sales-analytics","last_synced_at":"2025-11-12T05:31:07.307Z","repository":{"id":270681300,"uuid":"910251823","full_name":"fabioba/sales-analytics","owner":"fabioba","description":"This is an exercises provided by ChatGPT about sales data.","archived":false,"fork":false,"pushed_at":"2025-01-02T10:51:39.000Z","size":688,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-02T20:31:58.850Z","etag":null,"topics":["airflow","bigquery","etl-pipeline","googlecloudplatform","googlecloudstorage"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fabioba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-30T19:48:15.000Z","updated_at":"2025-01-02T11:35:25.000Z","dependencies_parsed_at":"2025-01-02T11:27:55.042Z","dependency_job_id":"ce701b45-692f-41b2-b8d2-9148f5c45e69","html_url":"https://github.com/fabioba/sales-analytics","commit_stats":null,"previous_names":["fabioba/sales_etl","fabioba/sales-analytics"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fabioba%2Fsales-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fabioba%2Fsales-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fabioba%2Fsales-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fabioba%2Fsales-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fabioba","download_url":"https://codeload.github.com/fabioba/sales-analytics/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240131746,"owners_count":19752727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","bigquery","etl-pipeline","googlecloudplatform","googlecloudstorage"],"created_at":"2025-01-02T20:26:22.531Z","updated_at":"2025-11-12T05:31:07.297Z","avatar_url":"https://github.com/fabioba.png","language":"Python","readme":"# sales_etl\nThis is an exercises provided by ChatGPT about sales data.\n\n## Scenario\nYou are working for a retail company that wants to analyze sales performance. The business goal is to track and understand sales trends across different dimensions like time, products, and regions. Your task is to create a star schema to support this analysis and implement the ETL pipeline using Apache Airflow.\n\n## Tasks\n1. Gather Business Requirements\n    - Understand the key questions the business wants to answer. For example:\n    - What are the total sales by product category each month?\n    - Which region has the highest sales in a given quarter?\n    - What is the trend of sales for a specific product over time?\n\n2. Conceptual Model\n\nDefine the high-level entities and their relationships.\n\n3. Logical Model\n\nDetail the specific attributes of each entity, including keys and relationships.\n\n4. Physical Model\n\nTranslate the logical model into a database schema with tables, columns, and data types.\nETL Pipeline\n\n5. Create an Airflow DAG to:\n\n- Extract sales, product, and region data from source systems.\n- Transform the data to match the star schema structure.\n- Load the data into a data warehouse.\n\n\n## Data\nBelow there's a data sample:\n```json\n[\n    {\n        \"sales_id\": 1234,\n        \"product_id\": 1,\n        \"product_name\": \"Laptop\",\n        \"category\": \"Electronics\",\n        \"price\": 1200.0,\n        \"quantity\": 2,\n        \"total_amount\": 2400.0,\n        \"customer_id\": 3,\n        \"customer_name\": \"Charlie\",\n        \"region\": \"East\",\n        \"payment_method\": \"Credit Card\",\n        \"sale_date\": \"2024-12-30 15:20:30\"\n    }\n]\n```\n\n## Conceptual Model\n![Diagram](documentation/img/conceptual_model.png)\n\n## Logical Model\n![Diagram](documentation/img/logical_model.png)\n\n## Physical Model\n![Diagram](documentation/img/physical_model.png)\n\n## Architecture - Data Flow\nSince the data is retrieved by a server (pull ingestion) these are the steps to achieve:\n- EXTRACT and LOAD\n1. read data from the API\n2. store raw data into a bucket\n3. read raw data from bucket and load to datawarehouse\n4. move raw data to hist\n\n- TRANSFORM\n1. populate the DIM tables\n2. populate the FCT table\n\n\n![Diagram](documentation/img/data_flow.png)\n\n\n## Architecture - Extract Load\n![Diagram](documentation/img/extract-load.png)\n\n\n### Bigquery - DDL\n\n```sql\ncreate table ace-mile-446412-j2.SALES.RAW_SALES(\n  category string,\n  customer_id string,\t\n  customer_name string,\n  payment_method string,\t\n  price string,\t\n  product_id\tstring,\n  product_name\tstring,\n  quantity\tstring,\n  region\tstring,\n  sale_date\tstring,\n  sales_id\tstring,\n  total_amount string,\n  insert_timestamp timestamp default current_timestamp()\n);\n\ncreate table ace-mile-446412-j2.SALES.EXT_RAW_SALES(\n  category string,\n  customer_id int64,\t\n  customer_name string,\n  payment_method string,\t\n  price float64,\t\n  product_id\tint64,\n  product_name\tstring,\n  quantity\tint64,\n  region\tstring,\n  sale_date\ttimestamp,\n  sales_id\tint64,\n  total_amount float64,\n  insert_timestamp timestamp default current_timestamp()\n);\n\nCREATE TABLE ace-mile-446412-j2.SALES.DIM_PRODUCT (\n    product_id int64,\n    product_name STRING,\n    category STRING,\n    price float64,\n  insert_timestamp timestamp default current_timestamp()\n);\n\nCREATE TABLE ace-mile-446412-j2.SALES.DIM_CUSTOMER (\n    customer_id int64,\n    customer_name STRING,\n    region STRING,\n  insert_timestamp timestamp default current_timestamp()\n);\n\nCREATE TABLE ace-mile-446412-j2.SALES.FCT_SALES (\n    fct_id int64,\n    sales_id int64,\n    sale_date timestamp,\n    customer_id int64,\n    product_id int64,\n    payment_method STRING,\n    quantity INT64,\n    total_amount FLOAT64,\n  insert_timestamp timestamp default current_timestamp()\n);\n\n\nCREATE TABLE ace-mile-446412-j2.SALES.CFG_FLOW_MANAGER (\n    FLOW_NAME STRING,\n    LAST_VALUE timestamp,\n  insert_timestamp timestamp default current_timestamp()\n);\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffabioba%2Fsales-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffabioba%2Fsales-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffabioba%2Fsales-analytics/lists"}