https://github.com/gulomovazukhrakhon/kafka-stream-processing
๐ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.
https://github.com/gulomovazukhrakhon/kafka-stream-processing
data-pipeline docker iot kafka portfolio-project postgresql python real-time streaming
Last synced: 3 months ago
JSON representation
๐ก A real-time streaming pipeline for IoT sensor data using Kafka, Docker, and PostgreSQL. Fully containerized and simulates CSV-based sensor streaming with Python. Designed as a portfolio project.
- Host: GitHub
- URL: https://github.com/gulomovazukhrakhon/kafka-stream-processing
- Owner: gulomovazukhrakhon
- Created: 2025-06-26T07:03:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-03T10:12:37.000Z (12 months ago)
- Last Synced: 2025-08-14T03:26:31.352Z (11 months ago)
- Topics: data-pipeline, docker, iot, kafka, portfolio-project, postgresql, python, real-time, streaming
- Language: Python
- Homepage:
- Size: 23.2 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ก Real-Time Environmental Sensor Data Pipeline
This project implements a real-time data pipeline using **Apache Kafka**, **Docker**, and **PostgreSQL** to simulate and process environmental sensor data from a CSV file. The pipeline mimics live IoT sensor streaming and stores the data for further analysis.
---
## ๐ Project Overview
- ๐ Source: Kaggle Dataset โ *Environmental Sensor Data (132k rows)*
- ๐ Simulates real-time sensor data with `pandas`
- ๐ Streams data into **Kafka topics** using a Python producer
- ๐ฅ Kafka consumer reads the stream and stores it in **PostgreSQL**
- ๐ณ Fully containerized with **Docker Compose**
- ๐ Kafka-UI for topic monitoring
---
## ๐ ๏ธ Technologies Used
| Component | Tech Stack |
|------------------|--------------------------------|
| Data Streaming | Apache Kafka |
| Messaging System | Kafka-Python (`kafka-python`) |
| Data Ingestion | Python + Pandas |
| Database | PostgreSQL |
| Orchestration | Docker + Docker Compose |
| Monitoring | Kafka-UI (ProvectusLabs) |
---
## ๐๏ธ Project Structure
```bash
elective2/
โโโ dataset/
โ โโโ iot_telemetry_data.csv
โโโ producer/
โ โโโ producer.py
โ โโโ requirements.txt
โ โโโ Dockerfile
โโโ consumer/
โ โโโ consumer.py
โ โโโ requirements.txt
โ โโโ Dockerfile
โโโ logs/
โ โโโ consumer.log
โ โโโ producer.log
โโโ db/
โ โโโ init.sql
โโโ docker-compose.yml
โโโ README.md
```
---
## ๐ง Kafka Topic
- **Topic Name**: `kafka-topic-postgress`
- **Created Dynamically** if not already present
- Partitions: `1`
- Replication: `1`
---
## ๐งช How to Run This Project
### 1. Clone the Repository
```bash
git clone https://github.com/gulomovazukhrakhon/elective2.git
cd elective2
```
### 2. Run Docker Compose
```bash
docker-compose up --build
```
This will start:
* Kafka + Zookeeper
* PostgreSQL
* Producer and Consumer
* Kafka-UI on port `8080`
### 3. Access Kafka UI
๐ Open: http://localhost:8080
๐ง Topic: kafka-topic-postgress
---
## ๐๏ธ PostgreSQL Table: telemtry_data
Created automatically via init.sql. Structure:
| Column | Type |
|----------|------------------|
| ts | DOUBLE PRECISION |
| device | TEXT |
| co | DOUBLE PRECISION |
| humidity | DOUBLE PRECISION |
| light | BOOLEAN |
| lpg | DOUBLE PRECISION |
| motion | BOOLEAN |
| smoke | DOUBLE PRECISION |
| temp | DOUBLE PRECISION |
---
## ๐งพ Logging and Monitoring
Both Python services implement robust logging using the `logging` module:
- Logs include timestamps, levels (INFO, WARNING, ERROR)
- Logged to both Docker container stdout and `.log` files inside the `/logs` directory
- Helpful for debugging retries, Kafka/DB issues, and success confirmations
Logs are visible via:
```bash
docker logs kafka-producer
docker logs kafka-consumer
```
---
## ๐ How It Works
1. producer.py reads CSV row by row
2. Sends each row as a JSON message to Kafka
3. Kafka holds messages under kafka-topic-postgress
4. consumer.py listens to the topic and inserts each message into PostgreSQL
5. Kafka UI displays messages in real-time
---
## โ
Project Highlights
* โฑ๏ธ Simulated real-time streaming (5s delay)
* ๐พ Durable storage with PostgreSQL
* ๐ก๏ธ Error handling and retry logic for resilience
* ๐ Modular and scalable architecture
* ๐ณ Fully containerized and reproducible
---
## ๐ Project Links
* Dataset:ย https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k
* Demo Video:ย https://drive.google.com/file/d/1HbTVXW3YuX7w2Z4ZMJYFX2QbSTGLFdvm/
---
## ๐ฉโ๐ป Author
**Zukhrahon Gulomova**
Applied Artificial Intelligence, IU International University