Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mikeacosta/data-model-cassandra
Data modeling and ETL pipeline using Apache Cassandra
https://github.com/mikeacosta/data-model-cassandra
cassandra data-model etl jupyter-notebook python
Last synced: 25 days ago
JSON representation
Data modeling and ETL pipeline using Apache Cassandra
- Host: GitHub
- URL: https://github.com/mikeacosta/data-model-cassandra
- Owner: mikeacosta
- Created: 2019-12-04T01:10:44.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-12-04T19:27:26.000Z (about 5 years ago)
- Last Synced: 2025-01-03T01:28:43.077Z (about 1 month ago)
- Topics: cassandra, data-model, etl, jupyter-notebook, python
- Language: Jupyter Notebook
- Homepage:
- Size: 753 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Modeling with Cassandra
## Description
The analytics team for music streaming startup Sparkify wants to anaylze the song-listening activity of their users. This analysis will be based on CSV files on user activity that exist on their mobile app.
The objective of this project is to create an Apache Cassandra database, and model and populate tables against which queries can be run as specified by the analytics team.
## Getting Started
### Dependencies
- Jupyter Notebooks
- Python
- Apache Cassandra
- SQL### Datasets
Source data consists of an `event_data` directory of CSV files partioned by date. Each file includes records of song listening activity by Sparkify mobile app users.
### Project Template
The Jupyter Notebook file `Project_1B_Project_Template.ipynb` includes detailed steps and code for modeling the Apache Cassandra database tables and building the ETL pipeline.