An open API service indexing awesome lists of open source software.

https://github.com/fabioba/udacity-dwh-etl

This project refers to an example of populating a star schema on AWS - Redshift ingesting data from AWS - S3.
https://github.com/fabioba/udacity-dwh-etl

aws datawarehouse etl

Last synced: 7 months ago
JSON representation

This project refers to an example of populating a star schema on AWS - Redshift ingesting data from AWS - S3.

Awesome Lists containing this project

README

          

# UDACITY-DWH-ETL

## Business Requirements
A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud.

## Scope
Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

## Getting Started
From terminal, to start the entire workflow:
* create tables
* etl process

Run the following scipt:
```
python main.py
```

## Project Structure
* `dwh.cfg` contains config data

## Source Data
* song_data
```json
{"num_songs": 1, "artist_id": "ARJIE2Y1187B994AB7", "artist_latitude": null, "artist_longitude": null, "artist_location": "", "artist_name": "Line Renaud", "song_id": "SOUPIRU12A6D4FA1E1", "title": "Der Kleine Dompfaff", "duration": 152.92036, "year": 0}

```

* log_data

## Star Schema Data Model
![image](images/er.drawio.png)

## Workflow
![image](images/workflow.drawio.png)