https://github.com/nashtech-labs/lambda-arch-spark

apache-spark cassandra kafka lambda-architecture spark

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/nashtech-labs/lambda-arch-spark
Owner: NashTech-Labs
License: other
Created: 2017-01-31T10:14:32.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2020-06-27T17:36:03.000Z (about 5 years ago)
Last Synced: 2025-02-02T01:31:59.155Z (5 months ago)
Topics: apache-spark, cassandra, kafka, lambda-architecture, spark
Language: Scala
Size: 23.4 KB
Stars: 75
Watchers: 12
Forks: 37
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# Lambda-Arch-Spark
In this project we are trying to analyse twitter's tweets using lambda architecture.

-----------------------------------------------------------------------
#### What is Lambda architecture ?
-----------------------------------------------------------------------
Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.
For more details please check [Twitter's tweets analysis using Lambda Architecture](https://blog.knoldus.com/2017/01/31/twitters-tweets-analysis-using-lambda-architecture/)

-----------------------------------------------------------------------
### Now Play
-----------------------------------------------------------------------
* Clone the project into local system : `$ git clone [email protected]:knoldus/Lambda-Arch-Spark.git`
* Akka requires that you have [Java 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or later installed on your machine.
* Install SBT if you do not have
* Install Kafka
* Install Cassandra
* We need to create twitter app to access twitter realtime tweets.
* We need to put twitter's app consumerKey,consumerSecret,accessToken and accessTokenSecret into application.conf file of this project.
* Before start the project we need to start kafka and cassandra.
* Execute `sbt clean compile` to build the product
* Execute `sbt run` to execute the project it will show you multiple option.
* We need to first start **TwitterStreamApp** to fetch tweets from twitter, then start **CassandraKafkaConsumer** which is responsible for fetch data from kafka and put into master dataset.After that we can start **SparkStreamingKafkaConsumer** for realtime view and **BatchProcessor** for batch view.There is another app **AkkaHttpServer** which is responsible for serving layer.Basically it merges realtime and batch view against pre specified query and retrun result back to web client.

-----------------------------------------------------------------------
### References
-----------------------------------------------------------------------
* [Akka HTTP](http://doc.akka.io/docs/akka/2.4.7/scala/http/index.html)
* [Scala](http://scala-lang.org/)
* [Apache Spark](http://spark.apache.org/)
* [Apache Spark Streaming](http://spark.apache.org/docs/latest/streaming-programming-guide.html)
* [Apache Cassandra](http://cassandra.apache.org/)
* [Apache Kafka](https://kafka.apache.org/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nashtech-labs/lambda-arch-spark

Awesome Lists containing this project

README