Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra

Data Modeling with Apache Cassandra
https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra

apache-cassandra aws data-engineering python

Last synced: 3 days ago
JSON representation

Data Modeling with Apache Cassandra

Awesome Lists containing this project

README

        

# Data-Engineer-Nanodegree-Projects-Udacity
Projects done in the [Data Engineer Nanodegree by Udacity.com](https://www.udacity.com/course/data-engineer-nanodegree--nd027)

## Course 1: Data Modeling

### Introduction to Data Modeling
- Understand the purpose of data modeling
- Identify the strengths and weaknesses of different types of databases and data storage techniques
- Create a table in Postgres and Apache Cassandra

### Relational Data Models
- Understand when to use a relational database
- Understand the difference between OLAP and OLTP databases
- Create normalized data tables
- Implement denormalized schemas (e.g. STAR, Snowflake)

### NoSQL Data Models
- Understand when to use NoSQL databases and how they differ from relational databases
- Select the appropriate primary key and clustering columns for a given use case
- Create a NoSQL database in Apache Cassandra

# Project
#### The image below is a screenshot of what the denormalized data should appear like in the event_datafile_new.csv

![image](https://user-images.githubusercontent.com/47276503/218794760-3c216787-ee1d-4277-97a0-bb713591ad43.png)


The Business Asked to made the Below Query Give me the artist, song title and song's length in the music app history
that was heard during sessionId = 338, and itemInSession =


First Create Table Table song_info And Choose only the Columns Asked To load



![image](https://user-images.githubusercontent.com/47276503/219933894-0d5e8a93-083e-4320-93c1-ab1b5f458944.png)



Note :
For the Above Query , the session_id and Column item_session was used as a partition key because the queries will filter by them
and used as clustering columns to help make up a unique key.

The Below Query Load Data Inside Table Song_info

![image](https://user-images.githubusercontent.com/47276503/218798640-3c3de9c2-7cd2-4390-8cf4-676fdb1160b2.png)

SELECT to verify that the data have been inserted into each table

![image](https://user-images.githubusercontent.com/47276503/218798817-a8c3f13a-5ffb-4934-9d36-d0186748ba5a.png)

![image](https://user-images.githubusercontent.com/47276503/224638536-06653681-04e9-4e99-842c-5a3dedb54ab8.png)

Thanks !

Email : [email protected]

Data Engineer