{"id":14982395,"url":"https://github.com/nashtech-labs/lambda-arch-spark","last_synced_at":"2025-10-29T12:31:36.339Z","repository":{"id":143357436,"uuid":"80509855","full_name":"NashTech-Labs/Lambda-Arch-Spark","owner":"NashTech-Labs","description":null,"archived":false,"fork":false,"pushed_at":"2020-06-27T17:36:03.000Z","size":24,"stargazers_count":75,"open_issues_count":0,"forks_count":37,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-02-02T01:31:59.155Z","etag":null,"topics":["apache-spark","cassandra","kafka","lambda-architecture","spark"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NashTech-Labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-31T10:14:32.000Z","updated_at":"2024-12-12T05:18:05.000Z","dependencies_parsed_at":"2023-06-10T18:30:15.917Z","dependency_job_id":null,"html_url":"https://github.com/NashTech-Labs/Lambda-Arch-Spark","commit_stats":null,"previous_names":["nashtech-labs/lambda-arch-spark"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NashTech-Labs%2FLambda-Arch-Spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NashTech-Labs%2FLambda-Arch-Spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NashTech-Labs%2FLambda-Arch-Spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NashTech-Labs%2FLambda-Arch-Spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NashTech-Labs","download_url":"https://codeload.github.com/NashTech-Labs/Lambda-Arch-Spark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238825745,"owners_count":19537118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","cassandra","kafka","lambda-architecture","spark"],"created_at":"2024-09-24T14:05:20.419Z","updated_at":"2025-10-29T12:31:36.005Z","avatar_url":"https://github.com/NashTech-Labs.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Lambda-Arch-Spark\nIn this project we are trying to analyse twitter's tweets using lambda architecture.\n\n-----------------------------------------------------------------------\n#### What is Lambda architecture ?\n-----------------------------------------------------------------------\nLambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.\nFor more details please check [Twitter's tweets analysis using Lambda Architecture](https://blog.knoldus.com/2017/01/31/twitters-tweets-analysis-using-lambda-architecture/)\n\n-----------------------------------------------------------------------\n### Now Play\n-----------------------------------------------------------------------\n* Clone the project into local system : `$ git clone git@github.com:knoldus/Lambda-Arch-Spark.git` \n* Akka requires that you have [Java 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) or later installed on your machine.\n* Install SBT if you do not have\n* Install Kafka\n* Install Cassandra\n* We need to create twitter app to access twitter realtime tweets.\n* We need to put twitter's app consumerKey,consumerSecret,accessToken and accessTokenSecret into application.conf file of this project.\n* Before start the project we need to start kafka and cassandra.\n* Execute `sbt clean compile` to build the product\n* Execute `sbt run` to execute the project it will show you multiple option.\n* We need to  first start **TwitterStreamApp** to fetch tweets from twitter, then start **CassandraKafkaConsumer** which is responsible for fetch data from kafka and put into master dataset.After that we can start **SparkStreamingKafkaConsumer** for realtime view and **BatchProcessor** for batch view.There is another app **AkkaHttpServer** which is responsible for serving layer.Basically it merges realtime and batch view against pre specified query and retrun result back to web client.\n\n-----------------------------------------------------------------------\n### References\n-----------------------------------------------------------------------\n* [Akka HTTP](http://doc.akka.io/docs/akka/2.4.7/scala/http/index.html)\n* [Scala](http://scala-lang.org/)\n* [Apache Spark](http://spark.apache.org/)\n* [Apache Spark Streaming](http://spark.apache.org/docs/latest/streaming-programming-guide.html)\n* [Apache Cassandra](http://cassandra.apache.org/)\n* [Apache Kafka](https://kafka.apache.org/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnashtech-labs%2Flambda-arch-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnashtech-labs%2Flambda-arch-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnashtech-labs%2Flambda-arch-spark/lists"}