https://github.com/mtholahan/kafka-mini-project
Built a streaming fraud detection system with Apache Kafka and Python. Deployed a Kafka cluster via Docker Compose, implemented a transaction generator and fraud detector using kafka-python, and routed suspicious transactions to separate topics for real-time monitoring. Demonstrates event streaming, producers, consumers, and containerization.
https://github.com/mtholahan/kafka-mini-project
bootcamp consumers data-engineering docker docker-compose event-driven fraud-detection kafka producers python springboard streaming
Last synced: about 1 month ago
JSON representation
Built a streaming fraud detection system with Apache Kafka and Python. Deployed a Kafka cluster via Docker Compose, implemented a transaction generator and fraud detector using kafka-python, and routed suspicious transactions to separate topics for real-time monitoring. Demonstrates event streaming, producers, consumers, and containerization.
- Host: GitHub
- URL: https://github.com/mtholahan/kafka-mini-project
- Owner: mtholahan
- Created: 2025-09-10T17:04:10.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-11-11T20:31:08.000Z (7 months ago)
- Last Synced: 2025-11-11T22:23:05.925Z (7 months ago)
- Topics: bootcamp, consumers, data-engineering, docker, docker-compose, event-driven, fraud-detection, kafka, producers, python, springboard, streaming
- Language: Python
- Size: 501 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kafka Mini Project
## 📖 Abstract
This project implements a real-time fraud detection pipeline using Apache Kafka and Python. The system simulates financial transactions, streams them through Kafka, and applies rule-based filtering to flag suspicious activity. The goal is to gain practical experience with streaming architectures, producers, consumers, and containerized deployments.
The workflow includes:
* Running a local Kafka cluster using Docker Compose with broker and Zookeeper services.
* Building a transaction generator that continuously produces randomized account transfers into a Kafka topic.
* Creating a fraud detector application that consumes transactions, evaluates them against business rules, and branches outputs into "legit" or "fraud" topics.
* Packaging all components with Dockerfiles, requirements.txt, and docker-compose.yml for reproducibility.
* Verifying results by consuming messages from output topics, confirming that transactions over $900 are correctly flagged as fraudulent.
Through this project, I gained hands-on skills in stream processing, Kafka topic design, producer/consumer APIs, and containerized workflow orchestration, while also exploring real-world challenges in fraud detection systems.
## 🛠 Requirements
- Docker Engine 20.x or later
- Docker Compose v2
- Ubuntu 22.04 LTS environment (tested)
- docker-compose.yml defining all services:
- zookeeper (Confluent cp-zookeeper)
- kafka broker (Confluent cp-kafka)
- generator (Python producer app)
- detector (Python consumer/producer app)
- Python dependency (inside app containers):
- kafka-python
## 🧰 Setup
- Clone repository and navigate to kafka-docker/ directory
- Build images: docker-compose build --no-cache
- Start cluster + apps: docker-compose up -d
- Verify broker startup logs (Kafka ready)
- Verify generator and detector services running
- Inspect Kafka topics via kafka-console-consumer from broker container
## 📊 Dataset
- Streaming data consists of synthetic transactions generated by the producer app
- Transaction schema includes: transaction_id, account_id, timestamp, amount, merchant, location
## ⏱️ Run Steps
- Start services with: docker-compose up -d
- Producer (generator) writes messages into topic: queueing.transactions
- Consumer (detector) reads queueing.transactions, applies fraud detection rules, and branches to:
- streaming.transactions.legit
- streaming.transactions.fraud
- Verify output using kafka-console-consumer inside broker container
## 📈 Outputs
- Two Kafka topics with processed messages:
- streaming.transactions.legit (valid transactions)
- streaming.transactions.fraud (flagged transactions)
- Console logs showing consumed/produced records
- Demonstration of near real-time fraud detection pipeline
## 📸 Evidence

Screenshot of Dockerized Kafka running

Screenshot of code execution

Screenshot of legitimate transactions

Screenshot of fraudulent transactions
## 📎 Deliverables
- [`docker-compose.yml`](./deliverables/docker-compose.yml)
- [`detector_requirements.txt`](./deliverables/detector_requirements.txt)
- [`detector_app.py`](./deliverables/detector_app.py)
- [`generator_requirements.txt`](./deliverables/generator_requirements.txt)
- [`generator_app.py`](./deliverables/generator_app.py)
## 🛠️ Architecture
- Multi-container Docker environment
- Services:
- Producer app → Kafka broker
- Detector app (consumer + branching producer)
- Zookeeper for coordination
- Data flow:
generator → queueing.transactions → detector → (fraud or legit topics)
## 🔍 Monitoring
- Kafka CLI tools (kafka-console-consumer) to inspect topics
- Docker logs for generator and detector services
- Broker logs for message flow validation
## ♻️ Cleanup
- Stop services: docker-compose down
- Remove local Docker volumes for Kafka logs/state if re-running
- Delete external Docker network if created manually
*Generated automatically via Python + Jinja2 + SQL Server table `tblMiniProjectProgress` on 11-11-2025 15:31:05*