https://github.com/frocode/real_streaming_kafka

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/frocode/real_streaming_kafka
Owner: FroCode
Created: 2024-10-06T22:23:02.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-01-14T19:35:13.000Z (over 1 year ago)
Last Synced: 2025-12-31T12:45:44.413Z (7 months ago)
Language: Python
Size: 728 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Realtime Data Streaming With TCP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch
## Table of Contents
- [Introduction](#introduction)
- [System Architecture](#system-architecture)
- [Technologies](#technologies)
- [Getting Started](#getting-started)

## Introduction

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

## System Architecture
![System_architecture.png](assets%2FSystem_architecture.png)

The project is designed with the following components:

- **Data Source**: We use `yelp.com` dataset for our pipeline.
- **TCP/IP Socket**: Used to stream data over the network in chunks
- **Apache Spark**: For data processing with its master and worker nodes.
- **Confluent Kafka**: Our cluster on the cloud
- **Control Center and Schema Registry**: Helps in monitoring and schema management of our Kafka streams.
- **Kafka Connect**: For connecting to elasticsearch
- **Elasticsearch**: For indexing and querying

## Technologies

- Python
- TCP/IP
- Confluent Kafka
- Apache Spark
- Docker
- Elasticsearch

## Getting Started

1. Clone the repository:
```bash
git clone https://github.com/FroCode/Real_Streaming_Kafka.git
```

2. Navigate to the project directory:
```bash
cd Real_Streaming_Kafka
```

3. Run Docker Compose to spin up the spark cluster:
```bash
docker-compose up
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/frocode/real_streaming_kafka

Awesome Lists containing this project

README