https://github.com/jbloch100/scalable-analytics-pipeline

Python-based real-time data pipeline using Kafka and Spark for streaming analytics
https://github.com/jbloch100/scalable-analytics-pipeline

big-data data-pipeline kafka python spark stream-processing

Last synced: 30 days ago
JSON representation

Python-based real-time data pipeline using Kafka and Spark for streaming analytics

Host: GitHub
URL: https://github.com/jbloch100/scalable-analytics-pipeline
Owner: jbloch100
Created: 2025-07-27T00:22:10.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-07-30T02:13:16.000Z (11 months ago)
Last Synced: 2025-08-11T11:02:18.861Z (10 months ago)
Topics: big-data, data-pipeline, kafka, python, spark, stream-processing
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Scalable Analytics Pipeline

## Overview
Simulates a simple analytics event processor for backend engineering. Processes mock ad events in real-time and generates aggregated metrics.

## Features
- Processes click/view/impression events
- Aggregates data in real-time
- CLI output summary

## Run
```bash
python src/main.py
```

## Test
```bash
python -m unittest tests/test_main.py
```

---

## Docker

### Build
```bash
docker build -t scalable-analytics-pipeline .
```

### Run
```bash
docker run --rm scalable-analytics-pipeline
```

### Test
```bash
docker run --rm scalable-analytics-pipeline python -m unittest discover -s tests
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jbloch100/scalable-analytics-pipeline

Awesome Lists containing this project

README