https://github.com/jbloch100/scalable-analytics-pipeline
Python-based real-time data pipeline using Kafka and Spark for streaming analytics
https://github.com/jbloch100/scalable-analytics-pipeline
big-data data-pipeline kafka python spark stream-processing
Last synced: 30 days ago
JSON representation
Python-based real-time data pipeline using Kafka and Spark for streaming analytics
- Host: GitHub
- URL: https://github.com/jbloch100/scalable-analytics-pipeline
- Owner: jbloch100
- Created: 2025-07-27T00:22:10.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-30T02:13:16.000Z (11 months ago)
- Last Synced: 2025-08-11T11:02:18.861Z (10 months ago)
- Topics: big-data, data-pipeline, kafka, python, spark, stream-processing
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scalable Analytics Pipeline
## Overview
Simulates a simple analytics event processor for backend engineering. Processes mock ad events in real-time and generates aggregated metrics.
## Features
- Processes click/view/impression events
- Aggregates data in real-time
- CLI output summary
## Run
```bash
python src/main.py
```
## Test
```bash
python -m unittest tests/test_main.py
```
---
## Docker
### Build
```bash
docker build -t scalable-analytics-pipeline .
```
### Run
```bash
docker run --rm scalable-analytics-pipeline
```
### Test
```bash
docker run --rm scalable-analytics-pipeline python -m unittest discover -s tests
```