Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nashtech-labs/lambda-arch-spark
https://github.com/nashtech-labs/lambda-arch-spark
apache-spark cassandra kafka lambda-architecture spark
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nashtech-labs/lambda-arch-spark
- Owner: NashTech-Labs
- License: other
- Created: 2017-01-31T10:14:32.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-06-27T17:36:03.000Z (over 4 years ago)
- Last Synced: 2024-10-12T00:04:26.316Z (4 months ago)
- Topics: apache-spark, cassandra, kafka, lambda-architecture, spark
- Language: Scala
- Size: 23.4 KB
- Stars: 74
- Watchers: 12
- Forks: 37
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Lambda-Arch-Spark
In this project we are trying to analyse twitter's tweets using lambda architecture.-----------------------------------------------------------------------
#### What is Lambda architecture ?
-----------------------------------------------------------------------
Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.
For more details please check [Twitter's tweets analysis using Lambda Architecture](https://blog.knoldus.com/2017/01/31/twitters-tweets-analysis-using-lambda-architecture/)-----------------------------------------------------------------------
### Now Play
-----------------------------------------------------------------------
* Clone the project into local system : `$ git clone [email protected]:knoldus/Lambda-Arch-Spark.git`
* Akka requires that you have [Java 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or later installed on your machine.
* Install SBT if you do not have
* Install Kafka
* Install Cassandra
* We need to create twitter app to access twitter realtime tweets.
* We need to put twitter's app consumerKey,consumerSecret,accessToken and accessTokenSecret into application.conf file of this project.
* Before start the project we need to start kafka and cassandra.
* Execute `sbt clean compile` to build the product
* Execute `sbt run` to execute the project it will show you multiple option.
* We need to first start **TwitterStreamApp** to fetch tweets from twitter, then start **CassandraKafkaConsumer** which is responsible for fetch data from kafka and put into master dataset.After that we can start **SparkStreamingKafkaConsumer** for realtime view and **BatchProcessor** for batch view.There is another app **AkkaHttpServer** which is responsible for serving layer.Basically it merges realtime and batch view against pre specified query and retrun result back to web client.-----------------------------------------------------------------------
### References
-----------------------------------------------------------------------
* [Akka HTTP](http://doc.akka.io/docs/akka/2.4.7/scala/http/index.html)
* [Scala](http://scala-lang.org/)
* [Apache Spark](http://spark.apache.org/)
* [Apache Spark Streaming](http://spark.apache.org/docs/latest/streaming-programming-guide.html)
* [Apache Cassandra](http://cassandra.apache.org/)
* [Apache Kafka](https://kafka.apache.org/)