{"id":19303373,"url":"https://github.com/kenhanscombe/project-postgres","last_synced_at":"2025-08-16T01:20:41.266Z","repository":{"id":201940744,"uuid":"220013706","full_name":"kenhanscombe/project-postgres","owner":"kenhanscombe","description":"Udacity data engineering nanodegree project","archived":false,"fork":false,"pushed_at":"2019-11-08T17:36:41.000Z","size":33,"stargazers_count":22,"open_issues_count":0,"forks_count":30,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-22T11:41:28.570Z","etag":null,"topics":["data-engineering","docker-image","postgres-database","python3","udacity-nanodegree"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kenhanscombe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-11-06T14:18:48.000Z","updated_at":"2025-02-26T11:50:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"538e3331-a155-48c5-9c81-da309b156c8a","html_url":"https://github.com/kenhanscombe/project-postgres","commit_stats":null,"previous_names":["kenhanscombe/project-postgres"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kenhanscombe/project-postgres","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenhanscombe%2Fproject-postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenhanscombe%2Fproject-postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenhanscombe%2Fproject-postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenhanscombe%2Fproject-postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kenhanscombe","download_url":"https://codeload.github.com/kenhanscombe/project-postgres/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenhanscombe%2Fproject-postgres/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270654149,"owners_count":24622909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-15T02:00:12.559Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","docker-image","postgres-database","python3","udacity-nanodegree"],"created_at":"2024-11-09T23:26:11.147Z","updated_at":"2025-08-16T01:20:41.243Z","avatar_url":"https://github.com/kenhanscombe.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project 1: Data modeling with Postgres\n\nThis **Udacity Data Engineering nanodegree** project creates a postgres database `sparkifydb` for a music app, *Sparkify*. The purpose of the database is to model song and log datasets (originaly stored in JSON format) with a star schema optimised for queries on song play analysis.\n\n\u003e **Note:** The whole exercise can be run in a docker container. See instruction below.\n\n## Schema design and ETL pipeline\n\nThe star schema has 1 *fact* table (songplays), and 4 *dimension* tables (users, songs, artists, time). `DROP`, `CREATE`, `INSERT`, and `SELECT` queries are defined in **sql_queries.py**. **create_tables.py** uses functions `create_database`, `drop_tables`, and `create_tables` to create the database sparkifydb and the required tables.\n\n![](sparkify_erd.png?raw=true)\n\nExtract, transform, load processes in **etl.py** populate the **songs** and **artists** tables with data derived from the JSON song files, `data/song_data`. Processed data derived from the JSON log files, `data/log_data`, is used to populate **time** and **users** tables. A `SELECT` query collects song and artist id from the **songs** and **artists** tables and combines this with log file derived data to populate the **songplays** fact table.\n\n## Song play example queries\n\nSimple queries might include number of users with each membership level.\n\n`SELECT COUNT(level) FROM users;`\n\nDay of the week music most frequently listened to.\n\n`SELECT COUNT(weekday) FROM time;`\n\nOr, hour of the day music most often listened to.\n\n`SELECT COUNT(hour) FROM time;`\n\n\u003cbr\u003e\n\n### **A docker image**\n\nI've created a docker image **postgres-student-image** on docker hub, from which you can run a container with user 'student', password 'student', and database **studentdb** (the starting point for the exercise). You do not need to install postgres (it runs in the container).\n\nTo download the image, install [docker](https://docs.docker.com/) which requires you to create a username and password. In a terminal, log into docker hub (you'll be prompted for your docker username and password)\n\n```\ndocker login docker.io\n```\n\nPull the image\n\n```\ndocker pull onekenken/postgres-student-image\n```\n\nRun the container\n\n```\ndocker run -d --name postgres-student-container -p 5432:5432 onekenken/postgres-student-image\n```\n\nThe **create_tables.py** pre-defined connection `conn = psycopg2.connect(\"host=127.0.0.1 dbname=studentdb user=student password=student\")` will now connect to the container.\n\nTo stop and remove the container after the exercise\n\n```\ndocker stop postgres-student-container\ndocker rm postgres-student-container\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenhanscombe%2Fproject-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkenhanscombe%2Fproject-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenhanscombe%2Fproject-postgres/lists"}