https://github.com/sebsop/realtime-parallel-kmeans-segmentation
Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.
https://github.com/sebsop/realtime-parallel-kmeans-segmentation
cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing
Last synced: about 10 hours ago
JSON representation
Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.
- Host: GitHub
- URL: https://github.com/sebsop/realtime-parallel-kmeans-segmentation
- Owner: sebsop
- License: mit
- Created: 2025-10-02T09:50:05.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2026-04-16T08:50:34.000Z (2 months ago)
- Last Synced: 2026-06-23T03:34:05.541Z (about 10 hours ago)
- Topics: cpp, cuda, k-means-clustering, mpi, multithreading, opencv, rcc, real-time-stream-processing
- Language: C++
- Homepage:
- Size: 14.1 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐จ Real-Time Parallel K-Means Image Segmentation
A high-performance computer vision system that performs real-time image segmentation using K-Means clustering with multiple parallel backends for optimal performance across different hardware configurations.

*Real-time parallel K-means clustering: original vs segmented frames with dynamic K-value control via slider*
---
## โจ Key Features
- **๐ Real-Time Performance**: Up to 55+ FPS with CUDA backend on live webcam feeds
- **๐ง Multiple Parallel Backends**: Sequential, Multi-threaded, MPI + OpenMP, and CUDA implementations
- **๐ณ RCC Tree Optimization**: Recursive Cached Coreset tree for efficient streaming segmentation
- **๐ฏ 5D Feature Space**: Combines color (BGR) and spatial (x,y) features for coherent segmentation
- **โก Dynamic Backend Switching**: Switch between backends in real-time with keyboard shortcuts
- **๐ Performance Monitoring**: Live FPS tracking with min/max statistics
- **๐ผ๏ธ Interactive Controls**: Adjustable K-value slider and side-by-side visualization
---
## ๐ง Technical Overview
This project implements an advanced K-Means clustering system optimized for real-time image segmentation with multiple parallelization strategies:
- **Core Algorithm**: K-Means clustering adapted for image segmentation
- **Feature Engineering**: 5D vectors combining color similarity and spatial proximity
- **Coreset Sampling**: Reduces computational complexity from O(nยทkยทt) to O(sยทkยทt), where n = total pixels, s = coreset size (s โช n)
- **RCC Tree Structure**: Maintains $(1 \pm \epsilon)$-approximation with bounded memory
- **Hardware Optimization**: Leverages multi-core CPUs, distributed systems, and GPUs
---
## ๐ Performance Benchmarks
### FPS Performance by Backend and K-value
| Backend | K=2 (Min FPS) | K=2 (Max FPS) | K=12 (Min FPS) | K=12 (Max FPS) | Performance Ratio |
|---------|---------------|---------------|----------------|----------------|-------------------|
| **Sequential** | 15 | 17 | 5 | 6 | 1.0ร (baseline) |
| **Multi-threaded** | 14 | 44 | 10 | 22 | 2.4ร average |
| **MPI** | 17 | 44 | 13 | 21 | 2.6ร average |
| **CUDA** | 14 | 55 | 15 | 44 | 3.2ร average |
### Performance Characteristics
```
Performance Improvement Factor (vs Sequential):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CUDA: โโโโโโโโโโโโโโโโ 3.2ร โ
โ MPI: โโโโโโโโโโโโโโ 2.6ร โ
โ Multi-thread: โโโโโโโโโโโโ 2.4ร โ
โ Sequential: โโโโ 1.0ร (baseline) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## ๐๏ธ Architecture Details
### Algorithmic Complexity
| Component | Complexity | Notes |
|-----------|------------|-------|
| **Sequential K-Means** | $O(n \cdot k \cdot t)$ | Baseline implementation |
| **Coreset K-Means** | $O(s \cdot k \cdot t)$ | Reduced complexity with $s \ll n$ |
| **RCC Tree Insertion** | $O(s \cdot \log N)$ | Streaming update per frame |
| **RCC Tree Merging** | $O(s)$ | Weighted merge operation |
| **GPU Assignment** | $O(n / \text{cores})$ | Massive parallelization |
### Backend-Specific Implementations
#### ๐ Multi-threaded (std::thread)
- **Strategy**: Row-based work distribution across CPU threads
- **Synchronization**: Lock-free design with barrier synchronization
- **Memory Safety**: Read-only sharing with exclusive write regions
- **Best For**: Multi-core desktop systems
#### ๐ Distributed (MPI + OpenMP)
- **Strategy**: Master-Worker pattern combining process-level (MPI) distribution of rows with thread-level (OpenMP) parallelization within each process.
- **Communication**: Uses MPI_Bcast to distribute cluster centers, frame data, and control parameters (k, dimensions, stop flag) and MPI_Gatherv for efficient, variable-sized result aggregation.
- **Hybrid Approach**: MPI collective calls act as synchronization barriers, ensuring data consistency across the network.
- **Best For**: HPC clusters and large distributed systems
#### ๐ฎ GPU-accelerated (CUDA)
- **Strategy**: One thread per pixel for maximum throughput
- **Memory Management**: Efficient host-device transfers
- **Synchronization**: cudaDeviceSynchronize() for completion barriers
- **Best For**: High-throughput applications with GPU hardware
---
## ๐ Getting Started
### Prerequisites
```bash
# System Requirements
- C++17 compatible compiler (GCC/MSVC/Clang)
- CMake 3.18+
- OpenCV 4.0+
- CUDA Toolkit (for GPU backend)
- MPI implementation (e.g., OpenMPI/MS-MPI for distributed backend)
```
### Building the Project
```bash
# Clone the repository
git clone https://github.com/sebsop/realtime-parallel-kmeans-segmentation.git
cd realtime-parallel-kmeans-segmentation
# Create build directory
mkdir build && cd build
# Configure with CMake
cmake ..
# Build the project
cmake --build . --config Release
# Change to the output directory
cd out/build/x64-Debug
# Run the application ()
mpiexec -n realtime_parallel_kmeans_segmentation.exe
```
### Runtime Controls
| Key | Action |
|-----|--------|
| **'1'** | Switch to Sequential backend |
| **'2'** | Switch to Multi-threaded backend |
| **'3'** | Switch to MPI backend |
| **'4'** | Switch to CUDA backend |
| **ESC** | Exit application |
| **K Slider** | Adjust cluster count (2-12) |
---
## ๐ง Configuration
### Algorithm Parameters
```cpp
// Default configuration values
const int k_min = 1; // Minimum clusters
const int k_max = 12; // Maximum clusters
const int sample_size = 2000; // Coreset size
const float color_scale = 1.0f; // Color feature scaling
const float spatial_scale = 0.5f; // Spatial feature scaling
```
### RCC Tree Settings
```cpp
const int max_levels = 8; // Maximum tree height
const int default_sample = 2000; // Default coreset size
```
---
## ๐ Real-Time Performance Thresholds
| FPS Range | Classification | User Experience | Recommended Backends |
|-----------|----------------|-----------------|----------------------|
| **30+ FPS** | Excellent | Smooth real-time | CUDA, Multi-threaded, MPI Hybrid (Kโค8) |
| **20-30 FPS** | Good | Acceptable real-time | Multi-threaded, MPI (Kโค6) |
| **10-20 FPS** | Fair | Noticeable lag | All backends (Kโค4) |
| **<10 FPS** | Poor | Choppy playback | Sequential only (high K) |
---
## ๐ฏ Use Cases
### ๐ฌ Video Processing
- Real-time video segmentation for streaming applications
- Live broadcast effects and background replacement
- Content creation and video editing workflows
### ๐ค Computer Vision Research
- Baseline implementation for segmentation algorithms
- Performance benchmarking across different hardware
- Educational demonstrations of parallel computing concepts
### ๐ฅ Medical Imaging
- Real-time analysis of medical imagery
- Interactive segmentation for diagnostic applications
- High-throughput batch processing of medical data
### ๐ฎ Interactive Applications
- Real-time augmented reality applications
- Interactive art installations
- Gaming and entertainment systems
---
## ๐ ๏ธ Customization
### Adding New Backends
```cpp
// In clustering_backends.hpp
enum Backend {
BACKEND_SEQ = 0,
BACKEND_CUDA = 1,
BACKEND_THR = 2,
BACKEND_MPI = 3,
BACKEND_CUSTOM = 4 // Your custom backend
};
// Implement your backend function
cv::Mat segmentFrameWithKMeans_custom(
const cv::Mat& frame, int k, int sample_size,
float color_scale, float spatial_scale);
```
### Tuning Performance
```cpp
// Adjust coreset sampling for speed vs quality trade-off
const int fast_sample_size = 1000; // Faster, lower quality
const int quality_sample_size = 5000; // Slower, higher quality
// Modify feature scaling for different segmentation characteristics
const float color_emphasis = 2.0f; // Emphasize color similarity
const float spatial_emphasis = 0.1f; // De-emphasize spatial proximity
```
---
## ๐ Project Structure
```
realtime-parallel-kmeans-segmentation/
โโโ ๐ include/ # Header files
โ โโโ clustering.hpp # Main clustering interface
โ โโโ clustering_backends.hpp # Backend implementations
โ โโโ coreset.hpp # Coreset data structures
โ โโโ rcc.hpp # RCC tree implementation
โ โโโ utils.hpp # Utility functions
โ โโโ video_io.hpp # Video I/O interface
โโโ ๐ src/ # Source files
โ โโโ ๐ clustering/ # Backend implementations
โ โ โโโ clustering_cuda.cu # CUDA GPU backend
โ โ โโโ clustering_entry.cpp # Backend dispatcher
โ โ โโโ clustering_mpi.cpp # MPI distributed backend
โ โ โโโ clustering_seq.cpp # Sequential CPU backend
โ โ โโโ clustering_thr.cpp # Multi-threaded backend
โ โโโ coreset.cpp # Coreset algorithms
โ โโโ main.cpp # Application entry point
โ โโโ rcc.cpp # RCC tree implementation
โ โโโ utils.cpp # Utility functions
โ โโโ video_io.cpp # Video I/O implementation
โโโ ๐ docs/ # Documentation
โ โโโ project__demo.gif # Program demonstration GIF
โโโ ๐ docs/ # Documentation
โ โโโ algorithms.md # Algorithm descriptions
โ โโโ parallelization.md # Synchronization details
โ โโโ performance.md # Performance analysis
โโโ ๐ tests/ # Test files
โ โโโ test_clustering.cpp # Clustering tests
โ โโโ test_coreset.cpp # Coreset tests
โ โโโ test_rcc_.cpp # RCC tree tests
โ โโโ test_utils.cpp # Utility tests
โ โโโ test_video_io_.cpp # Video I/O tests
โโโ CMakeLists.txt # Build configuration
โโโ LICENSE # MIT License
โโโ README.md # This file
```
---
## ๐ฌ Technical Deep Dive
### Recursive Cached Coreset (RCC) Tree
The RCC tree enables efficient streaming K-means by:
1. **Leaf Insertion**: New frame coresets inserted with carry propagation
2. **Node Merging**: Weighted coreset combination with bounded size
3. **Root Computation**: Dynamic merging of all levels for comprehensive representation
4. **Memory Bounds**: Tree height limited to prevent unbounded growth
### Synchronization Strategies
- **Multi-threaded**: Lock-free design with const references and exclusive write regions
- **MPI**: Collective operations (MPI_Bcast, MPI_Gatherv) with hybrid OpenMP parallelization
- **CUDA**: Host-device synchronization with cudaDeviceSynchronize() barriers
---
## ๐งช Known Limitations
1. **Memory Requirements**: CUDA backend requires sufficient GPU memory for large images
2. **Network Dependency**: MPI performance varies with network latency and bandwidth
3. **K-value Scaling**: All backends show performance degradation with very high cluster counts
4. **Hardware Specific**: Optimal performance depends on specific hardware configuration
---
## ๐ฎ Possible Future Enhancements
- [ ] **Adaptive Coreset Sizing**: Dynamic adjustment based on image complexity
- [ ] **Additional Color Spaces**: Support for HSV, LAB, and other color representations
- [ ] **Temporal Coherence**: Frame-to-frame consistency improvements
- [ ] **Mobile Optimization**: ARM NEON and mobile GPU backend support
- [ ] **Cloud Integration**: Distributed processing across cloud instances
---
## ๐ Acknowledgments
- **[OpenCV Team](https://opencv.org/)** โ For comprehensive computer vision library and excellent documentation
- **[NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit)** โ For GPU computing platform and development tools
- **[Open MPI Project](https://www.open-mpi.org/)** โ For high-performance message passing interface
- **[CMake Community](https://cmake.org/)** โ For cross-platform build system
- **Research Community** โ For foundational work on coreset algorithms and RCC trees
### Key References
- Feldman, D., Schmidt, M., & Sohler, C. (2013). *Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering*
- Bachem, O., Lucic, M., & Krause, A. (2017). *Practical coreset constructions for machine learning*
---
## ๐ License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## ๐ก Contact
Questions, feedback, or ideas? Reach out anytime at [sebastian.soptelea@proton.me](mailto:sebastian.soptelea@proton.me).