https://github.com/dimits-ts/large-scale-data

Distributed computing for data science tasks, executed on a Ubuntu server.
https://github.com/dimits-ts/large-scale-data

cassandra kafka map-reduce spark vagrant

Last synced: 6 months ago
JSON representation

Distributed computing for data science tasks, executed on a Ubuntu server.

Host: GitHub
URL: https://github.com/dimits-ts/large-scale-data
Owner: dimits-ts
Created: 2024-02-11T17:07:08.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2024-03-26T16:51:11.000Z (about 2 years ago)
Last Synced: 2025-01-21T02:25:49.966Z (over 1 year ago)
Topics: cassandra, kafka, map-reduce, spark, vagrant
Language: Java
Homepage:
Size: 27.9 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Large Scale Data Management

This repository consists of two projects: 

* A Java [Hadoop map-reduce application](https://github.com/dimits-exe/large-scale-data/tree/master/map-reduce) which:

  - Computes the occurences of each word for a large file

  - Computes spotify song statistics for each country and month

  

* A [Hadoop SPARK-Cassandra application](https://github.com/dimits-exe/large-scale-data/tree/master/project-2) which:

   - Generates a configurable stream of test data, posting them to a Kafka cluster

   - Reads, preprocesses and combines the stream data with static data using SPARK

   - Periodically posts them to a Cassandra cluster

   - Performs queries using CQL on the Cassandra cluster

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dimits-ts/large-scale-data

Awesome Lists containing this project

README