{"id":20924552,"url":"https://github.com/alkasaliss/data_modeling_postgres","last_synced_at":"2026-04-15T16:05:02.957Z","repository":{"id":97738784,"uuid":"294224915","full_name":"AlkaSaliss/data_modeling_postgres","owner":"AlkaSaliss","description":"Project on Data Modeling with Postgresql from Udacity Data Engineering Nanodegree","archived":false,"fork":false,"pushed_at":"2020-09-10T20:36:05.000Z","size":615,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-19T17:54:37.815Z","etag":null,"topics":["data-engineering","etl-pipeline","postgresql","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlkaSaliss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-09T20:41:29.000Z","updated_at":"2020-09-10T20:36:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"2f79e28b-2081-4b0f-b8a9-ac1c53b5eb0d","html_url":"https://github.com/AlkaSaliss/data_modeling_postgres","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlkaSaliss%2Fdata_modeling_postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlkaSaliss%2Fdata_modeling_postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlkaSaliss%2Fdata_modeling_postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlkaSaliss%2Fdata_modeling_postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlkaSaliss","download_url":"https://codeload.github.com/AlkaSaliss/data_modeling_postgres/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243318745,"owners_count":20272139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","etl-pipeline","postgresql","python","sql"],"created_at":"2024-11-18T20:23:25.494Z","updated_at":"2025-12-27T21:06:34.659Z","avatar_url":"https://github.com/AlkaSaliss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sparkify : Your Provider for The World Best Music Library\n\nThis repository contains scripts for **E**xtracting music data from json files, **T**ransform these data to match the **sparkifydb** database schema, and **L**oad these transformed data into the database.\n\nFollowing is a diagram representing the database schema, with 1 fact table `songplays` and 4 dimension tables `users`, `time`, `songs` and `artists` :\n\n![db schema](assests/sparkifydb_schema.png)\n\nThe database design follows a `star schema` to help our analyst team, **The Sparkalysts**, in their mission to answering the questions running through the head of our CEO **The Big Spark** such as :\n\n1. List of songs listened by user `Lily` `Koch` ?\n2. In which year did our users listened the most to music ?\n3. ...\n\n## Project Structure\n\nThe project is structured as follow:\n\n* A folder named `data` contains two subdirectories as provided by the project template : `log_data` (containing users' activities data) and `song_data` (containing songs and artists data)\n* A script `sql_queries.py` which contains all the SQL queries for creating the `sparkifydb`, the different tables, and some other queries.\n* A script `create_tables.py` which creates the database and the defined tables\n* A script `etl.py` which extracts and transforms the log and song data before loading the processed data into the tables created by the script `create_tables.py`\n\n## Project Setup\n\nTo set everything up, there is an extra requirement in addition to those specified in the previous section : a json configuration file `config.json` located at the same level of other scripts. This file contains the credentials for connecting to the database. Following is a sample configuration file content :\n\n```json\n{\n\t\"host\": \"127.0.0.1\",\n\t\"dbname\": \"studentdb\",\n\t\"user\": \"student\",\n\t\"password\": \"student\"\n}\n```\n\nThe field `dbname` represent the default database name.\n\nTo set up the project the following steps can be followed in the given order :\n\n* execute the `create_tables.py` script that will connect to the default database specified in the `config.json` file, drop the `udacitydb` database and the tables if it exists and recreate it, connect to the newly created database and drop the tables and recreates them\n  \n```bash\npython create_tables.py\n```\n\n* execute the script `etl.py` that will process the json files and populate the database\n\n```bash\npython etl.py\n```\n\n## Example queries\n\nHere are the results for the example of requests listed in the introduction:\n\n1. List of songs listened by user `Lily` `Koch` ?\n![query1](assests/q1.png)\n\n2. In which year did our users listened the most to music ?\n\n![query2](assests/q2.png)\n\n## TO-DO List\n\n* [ ] Add Analytic Dashboard for easier interaction with non-experts\n* [ ] Add more data quality check\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falkasaliss%2Fdata_modeling_postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falkasaliss%2Fdata_modeling_postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falkasaliss%2Fdata_modeling_postgres/lists"}