{"id":15056776,"url":"https://github.com/abdelrahman13-coder/data-modeling-with-apache-cassandra","last_synced_at":"2026-01-02T01:56:38.749Z","repository":{"id":133682271,"uuid":"527523813","full_name":"Abdelrahman13-coder/Data-Modeling-with-Apache-Cassandra","owner":"Abdelrahman13-coder","description":"Modeling event data to create a non-relational database and ETL pipeline for a music streaming app.","archived":false,"fork":false,"pushed_at":"2023-04-24T11:19:45.000Z","size":1588,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-14T07:45:49.811Z","etag":null,"topics":["apache-cassandra","data-engineering","etl-pipeline","nosql-database"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Abdelrahman13-coder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-22T10:55:21.000Z","updated_at":"2022-08-27T14:10:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"178950d1-f914-4d00-91d6-c809940b3206","html_url":"https://github.com/Abdelrahman13-coder/Data-Modeling-with-Apache-Cassandra","commit_stats":{"total_commits":7,"total_committers":2,"mean_commits":3.5,"dds":0.4285714285714286,"last_synced_commit":"d2c6b95951e46491102aac5a6caa4ef19b60d2a8"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdelrahman13-coder%2FData-Modeling-with-Apache-Cassandra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdelrahman13-coder%2FData-Modeling-with-Apache-Cassandra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdelrahman13-coder%2FData-Modeling-with-Apache-Cassandra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdelrahman13-coder%2FData-Modeling-with-Apache-Cassandra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Abdelrahman13-coder","download_url":"https://codeload.github.com/Abdelrahman13-coder/Data-Modeling-with-Apache-Cassandra/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243544665,"owners_count":20308168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-cassandra","data-engineering","etl-pipeline","nosql-database"],"created_at":"2024-09-24T21:56:23.268Z","updated_at":"2026-01-02T01:56:38.685Z","avatar_url":"https://github.com/Abdelrahman13-coder.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data-Modeling-with-Apache-Cassandra \u003cimg align=\"left\" alt=\"codeSTACKr | songs\" width=\"70px\" src=\"https://user-images.githubusercontent.com/58150666/185989388-c6b37f59-3d5c-4c21-b571-f70cbdf7e1f0.png\"/\u003e\r\n\r\n### **Project Overview**\r\n\u003e A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The analysis team is particularly interested in understanding what songs users are listening to. Currently, there is no easy way to query the data to generate the results, since the data reside in a directory of CSV files on user activity on the app.\r\n\r\n\u003e Create Apache Cassandra database which can create queries on song play data to answer the questions brough to the project.\r\n\r\n\u003e Building ETL Pipeine \r\n\r\n### **Project structure**\r\n1. `event_data` folder nested at the home of the project, where all needed data reside.\r\n2. `Project_1B_ Project_Template.ipynb` the code itself.\r\n3. `event_datafile_new.csv` a smaller event data csv file that will be used to insert data into the Apache Cassandra tables.\r\n4. `images` a screenshot of what the denormalized data should appear like in the `event_datafile_new.csv`\r\n5. `README.md` current file, provides discussion on my project.\r\n\r\n\r\n### **Datasets**\r\nThe workspace includes one dataset: `event_data`\r\nThe directory of CSV files partitioned by date. Here is an examples of the filepaths to tow files in the dataset:\r\n\r\n`event_data/2018-11-08-events.csv`\r\n\r\n`event_data/2018-11-09-events.csv`\r\n\r\n### **Resources**\r\nYou can download the resources from the workspace Terminal using the following commands\r\n\r\n```javascript\r\nzip -r event_data.zip event_data\r\n\r\nzip -r images.zip images\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdelrahman13-coder%2Fdata-modeling-with-apache-cassandra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdelrahman13-coder%2Fdata-modeling-with-apache-cassandra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdelrahman13-coder%2Fdata-modeling-with-apache-cassandra/lists"}