An open API service indexing awesome lists of open source software.

https://github.com/phaniteja5789/real-time-data-processing-pipeline-development

This project perform Analytics on Streaming Data.
https://github.com/phaniteja5789/real-time-data-processing-pipeline-development

kafka-producer-consumer kafka-streams pyspark-python python3

Last synced: 8 months ago
JSON representation

This project perform Analytics on Streaming Data.

Awesome Lists containing this project

README

          

# StreamingAnalytics
This project perform Analytics on Streaming Data.

Flow Diagram of the Project

![image](https://user-images.githubusercontent.com/36558484/152830762-0b3dd11d-f54a-4d22-b7a0-76cdd81ab765.png)

**DataSimulator.py**

Python File generates JSON Messages that are appended to a File with name TemperatureRecorded.txt

**Exceution Command:**

python DataSimulator.py 100

Total command Line Arguments 2

Argv[0] = File name

Argv[1]=Total Number of JSON Messgaes that are to be generated.

Once the Exceution Command is executed it generates the text file with name TemperatureRecorded.txt in the current working directory.

**Data needs to be sent to Kafka**

**Lists the active topics in kafka cluster**
bin/kafka-topics.sh --list --zookeeper localhost:2181

Zookeeper Running on 2181 Port

**Create a Topic with Name "SensorAnalytics"**
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic SensorAnalytics

**TopicName-SensorAnalytics
Replication Factor-1 Every Partition is replicated by 1
Partitions-2 Topic has 2 partitions
**

**Produce the data into Topic by using below command**
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic SensorAnalytics < TemperatureRecorded.txt

Now the Data is stored inside Kafka Cluster under Logical Storage(Topic)

**Submit the spark job using Spark-Submit use the below command**

spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.4.7.jar StreamingMetrics.py

Inside StreamingMetrics.py it connects to Kafka using KafkaUtils class and creates a DSTREAM by subscribing to Topic "SensorAnalytics".

Once the DSTREAM is recieved from the Kafka the RDD Operations are applied.

**Tech Stack used
1.Python
2.PySpark
3.Kafka
**