Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mikeacosta/data-model-cassandra

Data modeling and ETL pipeline using Apache Cassandra
https://github.com/mikeacosta/data-model-cassandra

cassandra data-model etl jupyter-notebook python

Last synced: 25 days ago
JSON representation

Data modeling and ETL pipeline using Apache Cassandra

Awesome Lists containing this project

README

        

# Data Modeling with Cassandra

## Description

The analytics team for music streaming startup Sparkify wants to anaylze the song-listening activity of their users. This analysis will be based on CSV files on user activity that exist on their mobile app.

The objective of this project is to create an Apache Cassandra database, and model and populate tables against which queries can be run as specified by the analytics team.

## Getting Started

### Dependencies
- Jupyter Notebooks
- Python
- Apache Cassandra
- SQL

### Datasets

Source data consists of an `event_data` directory of CSV files partioned by date. Each file includes records of song listening activity by Sparkify mobile app users.

### Project Template

The Jupyter Notebook file `Project_1B_Project_Template.ipynb` includes detailed steps and code for modeling the Apache Cassandra database tables and building the ETL pipeline.