https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra

Data Modeling with Apache Cassandra
https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra

apache-cassandra aws data-engineering python

Last synced: 4 months ago
JSON representation

Data Modeling with Apache Cassandra

Host: GitHub
URL: https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra
Owner: AbdallahQoutbAli
Created: 2023-02-14T15:52:58.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-03-13T07:48:41.000Z (over 2 years ago)
Last Synced: 2025-03-14T07:45:53.634Z (4 months ago)
Topics: apache-cassandra, aws, data-engineering, python
Language: Jupyter Notebook
Homepage:
Size: 851 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data-Engineer-Nanodegree-Projects-Udacity
Projects done in the [Data Engineer Nanodegree by Udacity.com](https://www.udacity.com/course/data-engineer-nanodegree--nd027)

## Course 1: Data Modeling

### Introduction to Data Modeling
- Understand the purpose of data modeling
- Identify the strengths and weaknesses of different types of databases and data storage techniques
- Create a table in Postgres and Apache Cassandra

### Relational Data Models
- Understand when to use a relational database
- Understand the difference between OLAP and OLTP databases
- Create normalized data tables
- Implement denormalized schemas (e.g. STAR, Snowflake)

### NoSQL Data Models
- Understand when to use NoSQL databases and how they differ from relational databases
- Select the appropriate primary key and clustering columns for a given use case
- Create a NoSQL database in Apache Cassandra

# Project
#### The image below is a screenshot of what the denormalized data should appear like in the event_datafile_new.csv

![image](https://user-images.githubusercontent.com/47276503/218794760-3c216787-ee1d-4277-97a0-bb713591ad43.png)

The Business Asked to made the Below Query Give me the artist, song title and song's length in the music app history
that was heard during sessionId = 338, and itemInSession =

First Create Table Table song_info And Choose only the Columns Asked To load

![image](https://user-images.githubusercontent.com/47276503/219933894-0d5e8a93-083e-4320-93c1-ab1b5f458944.png)

Note :
For the Above Query , the session_id and Column item_session was used as a partition key because the queries will filter by them
and used as clustering columns to help make up a unique key.

The Below Query Load Data Inside Table Song_info

![image](https://user-images.githubusercontent.com/47276503/218798640-3c3de9c2-7cd2-4390-8cf4-676fdb1160b2.png)

SELECT to verify that the data have been inserted into each table

![image](https://user-images.githubusercontent.com/47276503/218798817-a8c3f13a-5ffb-4934-9d36-d0186748ba5a.png)

![image](https://user-images.githubusercontent.com/47276503/224638536-06653681-04e9-4e99-842c-5a3dedb54ab8.png)

Thanks !

Email : [email protected]

Data Engineer

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome