https://github.com/fabioba/udacity-dwh-etl
This project refers to an example of populating a star schema on AWS - Redshift ingesting data from AWS - S3.
https://github.com/fabioba/udacity-dwh-etl
aws datawarehouse etl
Last synced: 7 months ago
JSON representation
This project refers to an example of populating a star schema on AWS - Redshift ingesting data from AWS - S3.
- Host: GitHub
- URL: https://github.com/fabioba/udacity-dwh-etl
- Owner: fabioba
- Created: 2022-11-09T16:19:57.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-24T15:20:35.000Z (almost 3 years ago)
- Last Synced: 2025-01-28T09:08:12.481Z (8 months ago)
- Topics: aws, datawarehouse, etl
- Language: Python
- Homepage:
- Size: 525 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# UDACITY-DWH-ETL
## Business Requirements
A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud.## Scope
Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.## Getting Started
From terminal, to start the entire workflow:
* create tables
* etl processRun the following scipt:
```
python main.py
```## Project Structure
* `dwh.cfg` contains config data## Source Data
* song_data
```json
{"num_songs": 1, "artist_id": "ARJIE2Y1187B994AB7", "artist_latitude": null, "artist_longitude": null, "artist_location": "", "artist_name": "Line Renaud", "song_id": "SOUPIRU12A6D4FA1E1", "title": "Der Kleine Dompfaff", "duration": 152.92036, "year": 0}```
* log_data
## Star Schema Data Model
## Workflow
