https://github.com/ovsundal/sparkify-sql-data-modelling

Data modelling with postgres
https://github.com/ovsundal/sparkify-sql-data-modelling

data-modeling etl-pipeline pandas postgresql psycopg2 python

Last synced: about 1 year ago
JSON representation

Data modelling with postgres

Host: GitHub
URL: https://github.com/ovsundal/sparkify-sql-data-modelling
Owner: ovsundal
Created: 2019-04-06T21:15:19.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-04-18T11:52:58.000Z (over 7 years ago)
Last Synced: 2025-04-19T19:05:04.579Z (over 1 year ago)
Topics: data-modeling, etl-pipeline, pandas, postgresql, psycopg2, python
Language: Jupyter Notebook
Homepage:
Size: 431 KB
Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Sparkify-data-modelling

This is project 1/5 of Udacitys Data Engineering Nanodegree. In this project a database for storing
music and artist records are created. Data is then extracted from the source, transformed using Pandas DataFrame, and loaded into the database. Two sets of data is used in the ETL process; song and log data. Song data provides song and artist information, while Log data is more extensive; providing covers song, artist and some metadata about each song. Log data is more extensive, providing artist and artist metadata.

#### Prerequisites for running the program
Prerequisites for running the project is python 3.x and postgres with a default database named "studentdb" available.

#### Starting the program
1. Execute "create_tables.py". This will create a fresh instance of the sparkifydb with empty tables.
2. Execute "etl.py". This will load the data into the tables

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ovsundal/sparkify-sql-data-modelling

Awesome Lists containing this project

README