https://github.com/bakliwalvaibhav1/real-time-weather-data-stream-kafka-grpc

Real Time Weather Data Streaming using Kafka and gRPC
https://github.com/bakliwalvaibhav1/real-time-weather-data-stream-kafka-grpc

grpc kafka python

Last synced: 2 months ago
JSON representation

Real Time Weather Data Streaming using Kafka and gRPC

Host: GitHub
URL: https://github.com/bakliwalvaibhav1/real-time-weather-data-stream-kafka-grpc
Owner: bakliwalvaibhav1
Created: 2025-03-22T09:46:39.000Z (2 months ago)
Default Branch: main
Last Pushed: 2025-03-22T11:24:38.000Z (2 months ago)
Last Synced: 2025-03-22T11:26:55.252Z (2 months ago)
Topics: grpc, kafka, python
Language: Python
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Real-Time Weather Data Stream with Kafka and gRPC

This project demonstrates the real-time processing of weather data using Apache Kafka for stream processing and gRPC for efficient communication. It consists of a Producer that generates weather data and sends it to Kafka topics, and a Consumer that processes the data and computes statistics.

## Table of Contents
- [Overview](#overview)
- [Technologies Used](#technologies-used)
- [Getting Started](#getting-started)
- [Running the Project](#running-the-project)
- [Kafka Setup](#kafka-setup)
- [Producer](#producer)
- [Consumer](#consumer)
- [Directory Structure](#directory-structure)

## Overview
In this project, weather data (temperature, station ID, and date) is produced and sent to Kafka topics, partitioned across multiple partitions. The Producer generates data and publishes it to the `temperatures` topic in Kafka, while the Consumer listens to specific partitions of this topic, processes the data, and computes basic statistics like average temperature per station.

- Producer: Sends weather reports to Kafka.
- Consumer: Reads weather reports, calculates statistics, and stores results in JSON files.
- Kafka: Used as a message broker for reliable data streaming.
- gRPC: Used for efficient communication and serialization of messages.
- Protobuf: For defining the structure of messages exchanged between Producer and Consumer.
- JSON: Used for storing and handling data for statistics calculation.

## Technologies Used

| Technology | Description |
|------------|-------------|
| Kafka | A distributed event streaming platform. |
| Python | The programming language used for both the Producer and Consumer. |
| gRPC | Used for efficient communication and serialization of messages. |
| Protobuf | For defining the structure of messages exchanged between Producer and Consumer. |
| JSON | Used for storing and handling data for statistics calculation. |

## Getting Started
Follow the steps below to get the project running on your local machine.

### Prerequisites
1. **Kafka**: You need to have **Apache Kafka** running locally. Follow the [Kafka Installation Guide](https://kafka.apache.org/quickstart) to set up Kafka on your machine.

2. **Python 3.x**: Ensure Python 3 and `pip` are installed on your system.

3. **Install Kafka Python Client**:
Install the Kafka Python client using the following command:

```bash
pip install kafka-python
```

4. **Install gRPC & Protobuf**:
To install gRPC and Protobuf libraries:

```bash
pip install grpcio grpcio-tools
```

### Running the Project

#### Kafka Setup
1. **Start Zookeeper**:

```bash
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
```

2. **Start Kafka Broker**:

```bash
kafka-server-start /usr/local/etc/kafka/server.properties
```

3. **Create a Kafka Topic**:
Run the following command to create the `temperatures` topic with 4 partitions:

```bash
kafka-topics --bootstrap-server localhost:9092 --create --topic temperatures --partitions 4 --replication-factor 1
```

#### Producer
The **Producer** generates weather data and sends it to Kafka.

1. **Run the Producer**:

```bash
python producer.py
```

The Producer will continuously generate weather data and send it to Kafka, using the `station_id` as the key to ensure that messages are distributed across the partitions.

#### Consumer
The **Consumer** reads the weather data from Kafka partitions, processes it, and calculates statistics such as the average temperature for each station.

1. **Run the Consumer**:

```bash
python consumer.py ...
```

Replace ``, ``, etc., with the partitions you wish the consumer to read from (e.g., `0 1`).

2. **Output**:
The Consumer will output processed statistics and save them into JSON files (`partition-0.json`, `partition-1.json`, etc.) located in the `./data` folder.

## Directory Structure
```bash
├── consumer.py # Consumer script to process Kafka messages
├── producer.py # Producer script to send data to Kafka
├── report_pb2.py # gRPC Protobuf file for weather data structure
├── data/ # Folder containing JSON files for partition statistics
│ ├── partition-0.json
│ ├── partition-1.json
│ └── ...
├── requirements.txt # Python dependencies
└── README.md # Project documentation
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bakliwalvaibhav1/real-time-weather-data-stream-kafka-grpc

Awesome Lists containing this project

README