https://github.com/danieldacosta/rdbms-data-modeling
ETL pipeline for building a database star schema.
https://github.com/danieldacosta/rdbms-data-modeling
etl postgresql rdbms
Last synced: 9 months ago
JSON representation
ETL pipeline for building a database star schema.
- Host: GitHub
- URL: https://github.com/danieldacosta/rdbms-data-modeling
- Owner: DanielDaCosta
- License: apache-2.0
- Created: 2020-09-27T01:16:04.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-07-20T18:16:01.000Z (over 1 year ago)
- Last Synced: 2024-07-20T19:31:29.508Z (over 1 year ago)
- Topics: etl, postgresql, rdbms
- Language: Python
- Homepage:
- Size: 521 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Rdbms Data Modeling
## Database Schema

## Files
- *database.ini*: database credentials(*host, database, user, password, port*)
- *create_tables.py*: run script to create database tables described on the image above
- *etl.py*: run script to tranfer all the data from folder `data` to database
- *sql_queries.py*: all queries
- *etl.ipynb*: step-by-step tutorial on how to prepare the data for insertion
## Usage
1. Starting local PostgreSQL instance:
```
docker-compose up
```
You can edit your database credentials in `database.ini`
2. Create the database running:
```
python create_tables.py
```
3. (optional) Run ETL.ipynb in order to check the full pipeline
4. Run `elt.py`: it will read and process all the files from `song_data` and `log_data`, loading them into the database
## Details
The data insertion is done using three methods from `psycopg2` library:
- copy_from
- execute(simple insertition statement)
- executemany