https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra
Data Modeling with Apache Cassandra
https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra
apache-cassandra aws data-engineering python
Last synced: 4 months ago
JSON representation
Data Modeling with Apache Cassandra
- Host: GitHub
- URL: https://github.com/abdallahqoutbali/data-modeling-with-apache-cassandra
- Owner: AbdallahQoutbAli
- Created: 2023-02-14T15:52:58.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-13T07:48:41.000Z (over 2 years ago)
- Last Synced: 2025-03-14T07:45:53.634Z (4 months ago)
- Topics: apache-cassandra, aws, data-engineering, python
- Language: Jupyter Notebook
- Homepage:
- Size: 851 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Engineer-Nanodegree-Projects-Udacity
Projects done in the [Data Engineer Nanodegree by Udacity.com](https://www.udacity.com/course/data-engineer-nanodegree--nd027)## Course 1: Data Modeling
### Introduction to Data Modeling
- Understand the purpose of data modeling
- Identify the strengths and weaknesses of different types of databases and data storage techniques
- Create a table in Postgres and Apache Cassandra### Relational Data Models
- Understand when to use a relational database
- Understand the difference between OLAP and OLTP databases
- Create normalized data tables
- Implement denormalized schemas (e.g. STAR, Snowflake)### NoSQL Data Models
- Understand when to use NoSQL databases and how they differ from relational databases
- Select the appropriate primary key and clustering columns for a given use case
- Create a NoSQL database in Apache Cassandra# Project
#### The image below is a screenshot of what the denormalized data should appear like in the event_datafile_new.csv
The Business Asked to made the Below Query Give me the artist, song title and song's length in the music app history
that was heard during sessionId = 338, and itemInSession =
First Create Table Table song_info And Choose only the Columns Asked To load

Note :
For the Above Query , the session_id and Column item_session was used as a partition key because the queries will filter by them
and used as clustering columns to help make up a unique key.
The Below Query Load Data Inside Table Song_info

SELECT to verify that the data have been inserted into each table


Thanks !
Email : [email protected]
Data Engineer