https://github.com/abdelrahman13-coder/data-modeling-with-apache-cassandra

Modeling event data to create a non-relational database and ETL pipeline for a music streaming app.
https://github.com/abdelrahman13-coder/data-modeling-with-apache-cassandra

apache-cassandra data-engineering etl-pipeline nosql-database

Last synced: 6 months ago
JSON representation

Modeling event data to create a non-relational database and ETL pipeline for a music streaming app.

Host: GitHub
URL: https://github.com/abdelrahman13-coder/data-modeling-with-apache-cassandra
Owner: Abdelrahman13-coder
Created: 2022-08-22T10:55:21.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2023-04-24T11:19:45.000Z (about 3 years ago)
Last Synced: 2025-03-14T07:45:49.811Z (over 1 year ago)
Topics: apache-cassandra, data-engineering, etl-pipeline, nosql-database
Language: Jupyter Notebook
Homepage:
Size: 1.51 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data-Modeling-with-Apache-Cassandra codeSTACKr | songs

### **Project Overview**
> A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The analysis team is particularly interested in understanding what songs users are listening to. Currently, there is no easy way to query the data to generate the results, since the data reside in a directory of CSV files on user activity on the app.

> Create Apache Cassandra database which can create queries on song play data to answer the questions brough to the project.

> Building ETL Pipeine

### **Project structure**
1. `event_data` folder nested at the home of the project, where all needed data reside.
2. `Project_1B_ Project_Template.ipynb` the code itself.
3. `event_datafile_new.csv` a smaller event data csv file that will be used to insert data into the Apache Cassandra tables.
4. `images` a screenshot of what the denormalized data should appear like in the `event_datafile_new.csv`
5. `README.md` current file, provides discussion on my project.

### **Datasets**
The workspace includes one dataset: `event_data`
The directory of CSV files partitioned by date. Here is an examples of the filepaths to tow files in the dataset:

`event_data/2018-11-08-events.csv`

`event_data/2018-11-09-events.csv`

### **Resources**
You can download the resources from the workspace Terminal using the following commands

```javascript
zip -r event_data.zip event_data

zip -r images.zip images
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abdelrahman13-coder/data-modeling-with-apache-cassandra

Awesome Lists containing this project

README