An open API service indexing awesome lists of open source software.

https://github.com/gulomovazukhrakhon/kafka-stream-processing

๐Ÿ’ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.
https://github.com/gulomovazukhrakhon/kafka-stream-processing

data-pipeline docker iot kafka portfolio-project postgresql python real-time streaming

Last synced: 3 months ago
JSON representation

๐Ÿ’ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.

Awesome Lists containing this project

README

          

# ๐Ÿ“ก Real-Time Environmental Sensor Data Pipeline

This project implements a real-time data pipeline using **Apache Kafka**, **Docker**, and **PostgreSQL** to simulate and process environmental sensor data from a CSV file. The pipeline mimics live IoT sensor streaming and stores the data for further analysis.

---

## ๐Ÿš€ Project Overview

- ๐Ÿ“ Source: Kaggle Dataset โ€“ *Environmental Sensor Data (132k rows)*
- ๐Ÿ”„ Simulates real-time sensor data with `pandas`
- ๐Ÿ”Œ Streams data into **Kafka topics** using a Python producer
- ๐Ÿ“ฅ Kafka consumer reads the stream and stores it in **PostgreSQL**
- ๐Ÿณ Fully containerized with **Docker Compose**
- ๐Ÿ“Š Kafka-UI for topic monitoring

---

## ๐Ÿ› ๏ธ Technologies Used

| Component | Tech Stack |
|------------------|--------------------------------|
| Data Streaming | Apache Kafka |
| Messaging System | Kafka-Python (`kafka-python`) |
| Data Ingestion | Python + Pandas |
| Database | PostgreSQL |
| Orchestration | Docker + Docker Compose |
| Monitoring | Kafka-UI (ProvectusLabs) |

---

## ๐Ÿ—‚๏ธ Project Structure
```bash
elective2/
โ”œโ”€โ”€ dataset/
โ”‚ โ””โ”€โ”€ iot_telemetry_data.csv
โ”œโ”€โ”€ producer/
โ”‚ โ”œโ”€โ”€ producer.py
โ”‚ โ”œโ”€โ”€ requirements.txt
โ”‚ โ””โ”€โ”€ Dockerfile
โ”œโ”€โ”€ consumer/
โ”‚ โ”œโ”€โ”€ consumer.py
โ”‚ โ”œโ”€โ”€ requirements.txt
โ”‚ โ””โ”€โ”€ Dockerfile
โ”œโ”€โ”€ db/
โ”‚ โ””โ”€โ”€ init.sql
โ”œโ”€โ”€ docker-compose.yml
โ””โ”€โ”€ README.md
```
---

## ๐Ÿง  Kafka Topic

- **Topic Name**: `kafka-topic-postgress`
- **Created Dynamically** if not already present
- Partitions: `1`
- Replication: `1`

---

## ๐Ÿงช How to Run This Project

### 1. Clone the Repository

```bash
git clone https://github.com/gulomovazukhrakhon/kafka-stream-processing.git
cd kafka-stream-processing
```
### 2. Run Docker Compose
```bash
docker-compose up --build
```

This will start:
* Kafka + Zookeeper
* PostgreSQL
* Producer and Consumer
* Kafka-UI on port `8080`

### 3. Access Kafka UI
๐Ÿ“ Open: http://localhost:8080


๐Ÿง  Topic: kafka-topic-postgress

---

## ๐Ÿ—„๏ธ PostgreSQL Table: telemtry_data
Created automatically via init.sql. Structure:

| Column | Type |
|----------|------------------|
| ts | DOUBLE PRECISION |
| device | TEXT |
| co | DOUBLE PRECISION |
| humidity | DOUBLE PRECISION |
| light | BOOLEAN |
| lpg | DOUBLE PRECISION |
| motion | BOOLEAN |
| smoke | DOUBLE PRECISION |
| temp | DOUBLE PRECISION |

---

## ๐Ÿ” How It Works

1. producer.py reads CSV row by row
2. Sends each row as a JSON message to Kafka
3. Kafka holds messages under kafka-topic-postgress
4. consumer.py listens to the topic and inserts each message into PostgreSQL
5. Kafka UI displays messages in real-time

---

## โœ… Project Highlights

* โฑ๏ธ Simulated real-time streaming (5s delay)
* ๐Ÿ’พ Durable storage with PostgreSQL
* ๐Ÿ›ก๏ธ Error handling and retry logic for resilience
* ๐Ÿ”Œ Modular and scalable architecture
* ๐Ÿณ Fully containerized and reproducible

---

## ๐Ÿ“š Dataset Source

[Kaggle: Environmental Sensor Data (132k rows)](https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k)

---

## ๐Ÿ‘ฉโ€๐Ÿ’ป Author

**Zukhrahon Gulomova**

Applied Artificial Intelligence, IU International University