An open API service indexing awesome lists of open source software.

https://github.com/jbloch100/scalable-analytics-pipeline

Python-based real-time data pipeline using Kafka and Spark for streaming analytics
https://github.com/jbloch100/scalable-analytics-pipeline

big-data data-pipeline kafka python spark stream-processing

Last synced: 30 days ago
JSON representation

Python-based real-time data pipeline using Kafka and Spark for streaming analytics

Awesome Lists containing this project

README

          

# Scalable Analytics Pipeline

## Overview
Simulates a simple analytics event processor for backend engineering. Processes mock ad events in real-time and generates aggregated metrics.

## Features
- Processes click/view/impression events
- Aggregates data in real-time
- CLI output summary

## Run
```bash
python src/main.py
```

## Test
```bash
python -m unittest tests/test_main.py
```

---

## Docker

### Build
```bash
docker build -t scalable-analytics-pipeline .
```

### Run
```bash
docker run --rm scalable-analytics-pipeline
```

### Test
```bash
docker run --rm scalable-analytics-pipeline python -m unittest discover -s tests
```