https://github.com/mikeacosta/data-model-cassandra

Data modeling and ETL pipeline using Apache Cassandra
https://github.com/mikeacosta/data-model-cassandra

cassandra data-model etl jupyter-notebook python

Last synced: 5 months ago
JSON representation

Data modeling and ETL pipeline using Apache Cassandra

Host: GitHub
URL: https://github.com/mikeacosta/data-model-cassandra
Owner: mikeacosta
Created: 2019-12-04T01:10:44.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-12-04T19:27:26.000Z (over 5 years ago)
Last Synced: 2025-01-10T19:43:20.298Z (6 months ago)
Topics: cassandra, data-model, etl, jupyter-notebook, python
Language: Jupyter Notebook
Homepage:
Size: 753 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data Modeling with Cassandra

## Description

The analytics team for music streaming startup Sparkify wants to anaylze the song-listening activity of their users. This analysis will be based on CSV files on user activity that exist on their mobile app.

The objective of this project is to create an Apache Cassandra database, and model and populate tables against which queries can be run as specified by the analytics team.

## Getting Started

### Dependencies
- Jupyter Notebooks
- Python
- Apache Cassandra
- SQL

### Datasets

Source data consists of an `event_data` directory of CSV files partioned by date. Each file includes records of song listening activity by Sparkify mobile app users.

### Project Template

The Jupyter Notebook file `Project_1B_Project_Template.ipynb` includes detailed steps and code for modeling the Apache Cassandra database tables and building the ETL pipeline.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikeacosta/data-model-cassandra

Awesome Lists containing this project

README