{"id":19746307,"url":"https://github.com/matz1979/redshift","last_synced_at":"2026-05-14T15:33:46.382Z","repository":{"id":120439447,"uuid":"239215340","full_name":"matz1979/redshift","owner":"matz1979","description":"My AWS Redshift project","archived":false,"fork":false,"pushed_at":"2020-04-07T19:03:53.000Z","size":23,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-28T07:48:59.816Z","etag":null,"topics":["aws","python","redshift","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matz1979.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-08T22:47:30.000Z","updated_at":"2020-04-07T20:51:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"d68546dc-ecff-46e0-9b3c-001f1af6fedd","html_url":"https://github.com/matz1979/redshift","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/matz1979/redshift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matz1979%2Fredshift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matz1979%2Fredshift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matz1979%2Fredshift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matz1979%2Fredshift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matz1979","download_url":"https://codeload.github.com/matz1979/redshift/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matz1979%2Fredshift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285809009,"owners_count":27235098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-22T02:00:05.934Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","python","redshift","sql"],"created_at":"2024-11-12T02:14:03.874Z","updated_at":"2025-11-22T15:00:57.273Z","avatar_url":"https://github.com/matz1979.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Warehouse for a music streaming platform\n\nThe startup has currently saved the data from its app locally in a SQL-Database.\nThey moved it to a AWS S3 storage so all app data are now on a S3 server.\nAs the startup grow a database now no make sense so we move to the Data Warehouse system\nfrom AWS and setup a Redshift Data Warehouse.\nSo that your Data team can still quickly extract the data and analyze the behaviors of the users\nfrom past data and the actual data.\n\n## Why I choice the AWS Redshift Data Warehouse system\n\nThe APP generate always the same data to the S3 storage.\nNew data can be load easy into the new Data Warehouse system from the AWS S3 storage\nalso the data team can access and manipulate the data with simple SQL queries.\n\n## How to use and explain the files\n\n* Fill the ```dwh.cfg``` with your AWS Redshift and user data.\n\n* Run the ```create_tables``` to create the database and the tables, that are located in the ```sql_queries.py``` file.\n\n* Run the ```etl.py``` to load the data from the S3 bucket into the SQL tables\n\n## The Star schema of the database\n\nIn this schema for the Sparkify Data Warehouse is the fact table songplays and it has four dimension tables:\n\n* songplays were stored all the data and some facts\n (song_id, user_id, artist_id, ...) Primary Key is songplay_id the Reference Keys are the\n Primary keys from the dimension tables\n\n* time were stored all time data (hour, day, month, ...) Primary Key is start_time\n\n* users were stored all user data (first name, last name, ...) Primary Key is user_id\n\n* songs were stored all song data (title, year, ...) Primary key is song_id\n\n* artist were stored all artist data (name, location, ...) Primary Key is artist_id\n\n![Sparkify Database Schema](sparkify_schem.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatz1979%2Fredshift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatz1979%2Fredshift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatz1979%2Fredshift/lists"}