https://github.com/frocode/real_streaming_kafka
https://github.com/frocode/real_streaming_kafka
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/frocode/real_streaming_kafka
- Owner: FroCode
- Created: 2024-10-06T22:23:02.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-14T19:35:13.000Z (over 1 year ago)
- Last Synced: 2025-12-31T12:45:44.413Z (4 months ago)
- Language: Python
- Size: 728 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Realtime Data Streaming With TCP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch
## Table of Contents
- [Introduction](#introduction)
- [System Architecture](#system-architecture)
- [Technologies](#technologies)
- [Getting Started](#getting-started)
## Introduction
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
## System Architecture

The project is designed with the following components:
- **Data Source**: We use `yelp.com` dataset for our pipeline.
- **TCP/IP Socket**: Used to stream data over the network in chunks
- **Apache Spark**: For data processing with its master and worker nodes.
- **Confluent Kafka**: Our cluster on the cloud
- **Control Center and Schema Registry**: Helps in monitoring and schema management of our Kafka streams.
- **Kafka Connect**: For connecting to elasticsearch
- **Elasticsearch**: For indexing and querying
## Technologies
- Python
- TCP/IP
- Confluent Kafka
- Apache Spark
- Docker
- Elasticsearch
## Getting Started
1. Clone the repository:
```bash
git clone https://github.com/FroCode/Real_Streaming_Kafka.git
```
2. Navigate to the project directory:
```bash
cd Real_Streaming_Kafka
```
3. Run Docker Compose to spin up the spark cluster:
```bash
docker-compose up
```