{"id":22100481,"url":"https://github.com/joyceannie/data-modeling-with-postgres","last_synced_at":"2025-03-24T01:47:28.190Z","repository":{"id":37633561,"uuid":"502425339","full_name":"joyceannie/Data-Modeling-With-Postgres","owner":"joyceannie","description":"The main focus of the project is data modeling with Postgres and build an ETL pipeline using Python. The first step is to define fact and dimension tables for a star schema for a particular analytic focus. The second step is to write an ETL pipeline that transfers data from files in different directories into these tables in Postgres using Python and SQL.","archived":false,"fork":false,"pushed_at":"2022-06-22T02:39:36.000Z","size":407,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-05T08:51:08.517Z","etag":null,"topics":["data-engineering","data-modeling","postgresql","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joyceannie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-11T18:23:31.000Z","updated_at":"2024-06-29T16:27:45.000Z","dependencies_parsed_at":"2022-09-02T09:22:47.491Z","dependency_job_id":null,"html_url":"https://github.com/joyceannie/Data-Modeling-With-Postgres","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joyceannie%2FData-Modeling-With-Postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joyceannie%2FData-Modeling-With-Postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joyceannie%2FData-Modeling-With-Postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joyceannie%2FData-Modeling-With-Postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joyceannie","download_url":"https://codeload.github.com/joyceannie/Data-Modeling-With-Postgres/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245195916,"owners_count":20575936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-modeling","postgresql","python"],"created_at":"2024-12-01T05:14:20.327Z","updated_at":"2025-03-24T01:47:28.170Z","avatar_url":"https://github.com/joyceannie.png","language":"Python","readme":"# Overview\n\nA startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The analytics team is particularly interested in understanding what songs users are listening to. Currently, they don't have an easy way to query their data. The objective of the project is to create a database schema and ETL pipeline for this analysis.\n\n# Description\n\nThe main focus of the project is data modeling with Postgres and build an ETL pipeline using Python. The first step is to define fact and dimension tables for a star schema for a particular analytic focus. The second step is to write an ETL pipeline that transfers data from files in different directories into these tables in Postgres using Python and SQL.\n\n# Dataset\n\n## Song Dataset\n\nThe first dataset is a subset of real data from the [Million Song Dataset](http://millionsongdataset.com/). Each file is in JSON format and contains metadata about a song and the artist of that song. \n\nSample Song Record:\n\n\u003e {\"num_songs\": 1, \"artist_id\": \"ARJIE2Y1187B994AB7\", \"artist_latitude\": null, \"artist_longitude\": null, \"artist_location\": \"\", \"artist_name\": \"Line Renaud\", \"song_id\": \"SOUPIRU12A6D4FA1E1\", \"title\": \"Der Kleine Dompfaff\", \"duration\": 152.92036, \"year\": 0}\n\n## Log Dataset\n\nThe second dataset consists of log files in JSON format generated by this [event simulator](https://github.com/Interana/eventsim) based on the song dataset. These simulate activity logs from a music streaming app based on specified configurations.\n\n# Database Schema\n\nWe have used star schema for the data modeling. The database consists of 5 tables.\n\n## Fact Table\n\n* songplays: The table records log data associated with the song played by users. \n\n## Dimension Tables\n\n* users: Users of Sparkify app.\n\n* songs: Songs in the dataset.\n\n* artists: Artists in the dataset.\n\n* time: Timestamps of the user activities.\n\n# Project Files\n\n* sql_queries.py: contains sql queries for dropping and creating all the tables. Also, contains insertion query template.\n\n* create_tables.py: contains code for setting up sparkify database with all the tables.\n\n* etl.ipynb: a jupyter notebook to analyse dataset before loading.\n\n* etl.py: reads and processes files from song_data and log_data and loads them into the tables.\n\n* test.ipynb: a notebook to perform data validation.\n\n# Environment\n\nPython 3.6 or above\n\nPostgresSQL 9.5 or above\n\npsycopg2 - PostgreSQL database adapter for Python\n\n# How to Run\n\nRun the python scripts in the same order as given below.\n\n```\npython create_tables.py \npython etl.py \n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoyceannie%2Fdata-modeling-with-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoyceannie%2Fdata-modeling-with-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoyceannie%2Fdata-modeling-with-postgres/lists"}