{"id":23577445,"url":"https://github.com/lixx21/data-engineer-christmas-data","last_synced_at":"2025-11-02T08:30:35.744Z","repository":{"id":269417530,"uuid":"906508150","full_name":"lixx21/data-engineer-christmas-data","owner":"lixx21","description":"The Christmas Project is a festive-themed data engineering initiative designed to integrate and analyze diverse datasets, creating a comprehensive view of Christmas-related trends. Leveraging modern cloud and data technologies, it brings together music, movies, sales, and weather data to showcase how technology can enhance the holiday spirit.","archived":false,"fork":false,"pushed_at":"2025-01-11T07:11:20.000Z","size":2645,"stargazers_count":12,"open_issues_count":0,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-11T08:20:10.639Z","etag":null,"topics":["airflow","aws","dbt","snowflake"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lixx21.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-21T05:13:21.000Z","updated_at":"2025-01-11T07:11:23.000Z","dependencies_parsed_at":"2024-12-23T12:35:33.405Z","dependency_job_id":"464dd331-cf6f-4877-8afe-20ce563062a7","html_url":"https://github.com/lixx21/data-engineer-christmas-data","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"c03b3e772cc63f0de798e6764be45df52cce8f38"},"previous_names":["lixx21/data-engineer-christmas-data"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fdata-engineer-christmas-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fdata-engineer-christmas-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fdata-engineer-christmas-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fdata-engineer-christmas-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lixx21","download_url":"https://codeload.github.com/lixx21/data-engineer-christmas-data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239389576,"owners_count":19630310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","aws","dbt","snowflake"],"created_at":"2024-12-26T22:29:13.864Z","updated_at":"2025-11-02T08:30:35.687Z","avatar_url":"https://github.com/lixx21.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎄 Christmas Project\n\nOverview\nThis project integrates data from multiple sources to create a comprehensive Christmas-themed dataset. Using modern ETL tools, cloud technologies, and a data transformation pipeline, the project organizes and processes data about Christmas playlists, movies, sales, and weather trends.\n\n![project architecture](images/architecture.png)\n\n![Dashboard](dashboard/dashboard.png)\n\n## Project Setup\n\n\n### Snowflake Setup\n\n- Create Snowflake account (you can try with the free tier) and make sure to choos AWS as your cloud integration\n- Create new data and name it as `CHRISTMAS_DATA`\n- Run this to integrate your airflow with \n- get private link to get hostname and port for snowflake ([reference](https://docs.snowflake.com/en/user-guide/admin-security-privatelink)) to connect your snowflake with your AWS Glue ETL later\n\n```\nSELECT SYSTEM$ALLOWLIST();\n```\n\n- create tables for `sg_christmas_playlist`, `sg_christmas_movies`, `sg_christmas_sales`, `sg_christmas_weather` in `CHRISTMAS_DATA` database and `PUBLIC` schema\n\n### AWS Setup\n\n- Create IAM ROLE in AWS for AWS for secret manager, s3 and AWS Glue\n- Creat AWS Secret Manager for snowflake. Choose **Other type of secret** and fill this key and value for snowflake connection\n```\nKEY= sfUser, Value= your snowflake login username\nKEY= sfPassword, value= your snowflake login password\n```\n- Create S3 bucket name it as `christmas-project-data`\n- Create `AWS Glue Crawler` and `AWS Glue Data Catalog`\n- Create ETL JOB in AWS Glue and connect data from source AWS Data Catalog (Source) to Snoflake (Target)\n\n### Docker Setup\n\n- in Docker compose, fill this environment variables with your key\n\n```\nSPOTIFY_CLIENT_ID: \u003cSPOTIFY CLIENT ID\u003e\nSPOTIFY_CLIENT_SECRET: \u003cSPOTIFY CLIENT SECRET\u003e\nWEATHER_API_KEY: \u003cWEATHER API KEY\u003e\nAWS_REGION_NAME: \u003cAWS REGION NAME\u003e\nAWS_SECRET_ACCESS_KEY: \u003cAWS SECRET ACCESS KEY\u003e\nAWS_ACCESS_KEY_ID: \u003cAWS ACCESS KEY ID\u003e\n```\n\n### DBT Setup\n\n- Change some variable with your snowflake database configuration in [dbt_transform/profiles.yml](dbt_transform/profiles.yml)\n\n```\ndbt_transform:\n  outputs:\n    dev:\n      account: \u003cyour account\u003e\n      database: \u003cyour database\u003e\n      password: \u003cyour snowflake password\u003e\n      role: \u003cyour snowflake role\u003e\n      schema: \u003cyour snowflake schema\u003e\n      threads: 1\n      type: snowflake\n      user: \u003cyour snowflake username\u003e\n      warehouse: \u003cyour snowflake warehouse name\u003e\n  target: dev\n\n```\n\n## Run Project\n\n1. run ```docker-compose up --build -d```\n2. then run the DAG for **christmas_data_pipeline**\n3. Create and run AWS Glue Crawler and make sure to schedule the crawler daily after the DAG running to keep the data updated\n4. Create and run AWS Glue ETL Job and make sure to schedulte the ETL daily after the DAG and Crawler running \n5. Run DBT using this command\n```\ncd dbt_transform\ndocker build -t dbt-docker -f dbt-dockerfile .\ndocker run dbt-docker dbt run\n```\n\n![aws_glue_etl](images/aws_glue_etl.png)\n\n## Data\n\n- Weather API: https://openweathermap.org/price#weather\n- Christmas Sales Data and Trend: https://www.kaggle.com/datasets/ibikunlegabriel/christmas-sales-and-trends\n- Christmas Movies: https://www.kaggle.com/datasets/jonbown/christmas-movies\n- Playlist data: https://developer.spotify.com/documentation/web-api/reference/search ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flixx21%2Fdata-engineer-christmas-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flixx21%2Fdata-engineer-christmas-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flixx21%2Fdata-engineer-christmas-data/lists"}