https://github.com/gulomovazukhrakhon/kafka-stream-processing
๐ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.
https://github.com/gulomovazukhrakhon/kafka-stream-processing
data-pipeline docker iot kafka portfolio-project postgresql python real-time streaming
Last synced: 3 months ago
JSON representation
๐ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.
- Host: GitHub
- URL: https://github.com/gulomovazukhrakhon/kafka-stream-processing
- Owner: gulomovazukhrakhon
- Created: 2025-06-26T07:03:22.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-06-26T07:09:20.000Z (3 months ago)
- Last Synced: 2025-06-26T08:23:05.014Z (3 months ago)
- Topics: data-pipeline, docker, iot, kafka, portfolio-project, postgresql, python, real-time, streaming
- Language: Python
- Homepage:
- Size: 23.2 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ก Real-Time Environmental Sensor Data Pipeline
This project implements a real-time data pipeline using **Apache Kafka**, **Docker**, and **PostgreSQL** to simulate and process environmental sensor data from a CSV file. The pipeline mimics live IoT sensor streaming and stores the data for further analysis.
---
## ๐ Project Overview
- ๐ Source: Kaggle Dataset โ *Environmental Sensor Data (132k rows)*
- ๐ Simulates real-time sensor data with `pandas`
- ๐ Streams data into **Kafka topics** using a Python producer
- ๐ฅ Kafka consumer reads the stream and stores it in **PostgreSQL**
- ๐ณ Fully containerized with **Docker Compose**
- ๐ Kafka-UI for topic monitoring---
## ๐ ๏ธ Technologies Used
| Component | Tech Stack |
|------------------|--------------------------------|
| Data Streaming | Apache Kafka |
| Messaging System | Kafka-Python (`kafka-python`) |
| Data Ingestion | Python + Pandas |
| Database | PostgreSQL |
| Orchestration | Docker + Docker Compose |
| Monitoring | Kafka-UI (ProvectusLabs) |---
## ๐๏ธ Project Structure
```bash
elective2/
โโโ dataset/
โ โโโ iot_telemetry_data.csv
โโโ producer/
โ โโโ producer.py
โ โโโ requirements.txt
โ โโโ Dockerfile
โโโ consumer/
โ โโโ consumer.py
โ โโโ requirements.txt
โ โโโ Dockerfile
โโโ db/
โ โโโ init.sql
โโโ docker-compose.yml
โโโ README.md
```
---## ๐ง Kafka Topic
- **Topic Name**: `kafka-topic-postgress`
- **Created Dynamically** if not already present
- Partitions: `1`
- Replication: `1`---
## ๐งช How to Run This Project
### 1. Clone the Repository
```bash
git clone https://github.com/gulomovazukhrakhon/kafka-stream-processing.git
cd kafka-stream-processing
```
### 2. Run Docker Compose
```bash
docker-compose up --build
```This will start:
* Kafka + Zookeeper
* PostgreSQL
* Producer and Consumer
* Kafka-UI on port `8080`### 3. Access Kafka UI
๐ Open: http://localhost:8080
๐ง Topic: kafka-topic-postgress---
## ๐๏ธ PostgreSQL Table: telemtry_data
Created automatically via init.sql. Structure:| Column | Type |
|----------|------------------|
| ts | DOUBLE PRECISION |
| device | TEXT |
| co | DOUBLE PRECISION |
| humidity | DOUBLE PRECISION |
| light | BOOLEAN |
| lpg | DOUBLE PRECISION |
| motion | BOOLEAN |
| smoke | DOUBLE PRECISION |
| temp | DOUBLE PRECISION |---
## ๐ How It Works
1. producer.py reads CSV row by row
2. Sends each row as a JSON message to Kafka
3. Kafka holds messages under kafka-topic-postgress
4. consumer.py listens to the topic and inserts each message into PostgreSQL
5. Kafka UI displays messages in real-time---
## โ Project Highlights
* โฑ๏ธ Simulated real-time streaming (5s delay)
* ๐พ Durable storage with PostgreSQL
* ๐ก๏ธ Error handling and retry logic for resilience
* ๐ Modular and scalable architecture
* ๐ณ Fully containerized and reproducible---
## ๐ Dataset Source
[Kaggle: Environmental Sensor Data (132k rows)](https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k)
---
## ๐ฉโ๐ป Author
**Zukhrahon Gulomova**
Applied Artificial Intelligence, IU International University