Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skp-1997/bigvideoanalytics
Building video analytics framework for large scale application using Big Data.
https://github.com/skp-1997/bigvideoanalytics
bigdata docker docker-compose kafka python spark videoanalytics
Last synced: about 1 month ago
JSON representation
Building video analytics framework for large scale application using Big Data.
- Host: GitHub
- URL: https://github.com/skp-1997/bigvideoanalytics
- Owner: skp-1997
- Created: 2024-08-17T07:13:06.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-17T08:30:14.000Z (3 months ago)
- Last Synced: 2024-09-29T07:01:37.433Z (about 2 months ago)
- Topics: bigdata, docker, docker-compose, kafka, python, spark, videoanalytics
- Language: Python
- Homepage:
- Size: 739 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Architecture of the project
![BigDataVideo drawio](https://github.com/user-attachments/assets/52548c39-2f1e-4e20-82b2-13d24758cb2c)
The architecture consists of the following components:
- Producer: Reads frames from video files or live streams and publishes them to a Kafka server. Each frame is sent to a topic corresponding to the video file name.
- Kafka Server: Stores frames in their respective topics.
- Spark Consumer: Consumes frames from Kafka, applies a user-defined function (UDF), such as a face detector, and pushes processed frames to a second Kafka server.
- Final Kafka Consumer: Writes frames according to the topic name and saves the processed videos to the output folder.# Installation
## Using Docker Compose
Spin up Kafka containers for two servers (listening on ports 9093 and 9095) and Zookeeper (listening on port 2181) using Docker Compose:
```
docker-compose up -d
```
## Manual Installation1. Kafka:
- Run Kafka using kafka_start.sh.
- Ensure you create two different server.properties files in the conf directory and adjust the broker ID and listening port.
2. Spark:- Download and install Spark from Apache Spark Downloads.
- Alternatively, use the provided Dockerfile for Spark installation.3. Python Libraries:
- Create a Conda environment and install the required libraries from requirements.txt:
```
pip install -r requirements.txt
```
# Running the program1. Start the Producer:
```
python confluentKafkaProducer
```
2. Start the Spark Consumer:- Source the bash profile:
```
source ~/.bash_profile
```
- Run Spark with the following command:
```
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 sparkConsumer.py
```3. Start the Kafka Consumer:
```
python kafkaConsumer.py
```# Useful Tips
- To ensure Spark can access Conda environment libraries, set these environment variables:
export PYSPARK_PYTHON=$(which python)
export PYSPARK_DRIVER_PYTHON=$(which python)
- To list running Kafka topics:
bin/kafka-topics.sh --list --bootstrap-server localhost:PORT
- To delete a Kafka topic:
kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic your_topic_name- Some suggestions:
* Here, I have used only two brokers with replication factor of 2, you can update it as per the requirements.
* I have taken only one partition each topic, you can update it for faster processing.
* You can use Kafka streaming API instead of Spark for processing frames.
* You can work on tracking objects across the frames. The basic code is there in repo.
* I am using .csv file to read camera metadata. You can use other databases for storing camera details.