{"id":22463522,"url":"https://github.com/renanfmoises/songplays-data-modeling-postgres","last_synced_at":"2026-05-11T06:02:17.020Z","repository":{"id":135393295,"uuid":"439076777","full_name":"renanfmoises/songplays-data-modeling-postgres","owner":"renanfmoises","description":"This project covers a simple ETL process with PostgresSQL for storing data from the fictitious company Sparkify, a music streaming application.","archived":false,"fork":false,"pushed_at":"2022-04-04T13:04:31.000Z","size":3504,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-01T18:42:44.099Z","etag":null,"topics":["database","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/renanfmoises.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-16T17:39:41.000Z","updated_at":"2022-04-04T12:45:03.000Z","dependencies_parsed_at":"2024-07-20T17:31:29.858Z","dependency_job_id":null,"html_url":"https://github.com/renanfmoises/songplays-data-modeling-postgres","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/renanfmoises%2Fsongplays-data-modeling-postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/renanfmoises%2Fsongplays-data-modeling-postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/renanfmoises%2Fsongplays-data-modeling-postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/renanfmoises%2Fsongplays-data-modeling-postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/renanfmoises","download_url":"https://codeload.github.com/renanfmoises/songplays-data-modeling-postgres/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245862052,"owners_count":20684618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","python","sql"],"created_at":"2024-12-06T09:13:19.746Z","updated_at":"2026-05-11T06:02:11.993Z","avatar_url":"https://github.com/renanfmoises.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cpicture\u003e\n    \u003cimg src=\"/repo/img/data-modeling-and-etl.png\" width=\"900\"\u003e\n\u003c/picture\u003e\n\n# Simple Data Modeling and ETL with Postgres and Python\n\n\u003csmall\u003eThe current repository stores code from Udacity's Data Engineering Nanodegree first hands-on project. \u003c/small\u003e\n\n***\n\n## Summary\n**This project covers a simple ETL process for storing data from the fictitious company Sparkify, a music streaming application.**\n\nThe database is created and populated with `python` scripts running `psycopg2` methods to query and interact via `SQL` with PostgresSQL. After running the script we will end up with a database fully populated with data relevant to the Sparkify operation.\n\n## Installation\n\n### Python version\n\nI have used python 3.8.6 for this project.\n\nFurther libraries and its respective versions can be found in `requirements.txt`.\n\n\n### A step-by-step guide to running the ETL:\n\n1. Download and install PostgresSQL app;\n    - Crete and start a new server;\n\n2. Install **psycopg2** library:\n   - `pip install psycopg2`;\n\n3. Run `create_tables.py` on your terminal;\n\n4. Run cells on `notebooks/test.ipynb` to test the ETL process.\n\n**If everything works fine, you should see the queries results as outputs in the notebook.**\n\n## The Files\n\n### `sql_queries.py`: The Tables\n\nQueries and code for creating and populating the Database can be found in the `sql_queries.py` file.\n\nThe tables created and populated with those scripts are:\n\n- Songplays;\n- Users;\n- Songs;\n- Artists;\n- Time\n\n### `create_tables.py`: The Database Set-up\n\nIn order for the **ETL** to work properly, the `create_table.py` file must b e run first.\n\nThis file will create the database and tables, bring the database up to date with the latest schema, and populate the tables with the data. Queries in this file are imported from `sql_queries.py`.\n\nThe `create_table.py` file also works as a restarter for the project. Once such file is run, the database will be dropped and recreated, ensuring that previous data contained in it is dropped such as the database itself and the tables. This will help with inconveniently running into errors of duplicated tables and data.\n\nThe code is fully documented, please check it out.\n\n### `notebooks/etl.ipynb`: The ETL Sandbox\n\nThis file is a Jupyter Notebook with the process of designing the best flow for the ETL process.\n\nThe notebook is divided into sections, each section is a different step of the ETL process.\n\nSections also are documented and it is worth reading them to have a better understanding of the process automated in the `etl.py` file.\n\n### `etl.py`: The ETL Process\n\nThis script will process the data stored in `song_data` and `log_data` json files and is consolidated using the previous scripts and notebooks.\n\n**PLEASE NOTE**: \u003cu\u003eEven though this project was built using fake data, such files are not included in this repo, in accordance with best practices of data governance.\u003c/u\u003e\n\n### `notebooks/test.ipynb`: The Checks and Tests\n\nThis notebook works as a simple interface for connecting to the database and checking if queries are working as expected.\n\nReally ah hoc, if the code runs fine and the queries return expected results, the data will be shown immediately. This works as a simple test for the ETL process.\n\n***\n\n### Contact \u0026 PR\n\nFeel free to hit me up with suggestions of how to make it even simpler if you know some tricks I may have missed. PR are also welcome.\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frenanfmoises%2Fsongplays-data-modeling-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frenanfmoises%2Fsongplays-data-modeling-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frenanfmoises%2Fsongplays-data-modeling-postgres/lists"}