{"id":51062401,"url":"https://github.com/sebsop/realtime-parallel-kmeans-segmentation","last_synced_at":"2026-06-23T03:34:07.229Z","repository":{"id":317714705,"uuid":"1068381656","full_name":"sebsop/realtime-parallel-kmeans-segmentation","owner":"sebsop","description":"Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.","archived":false,"fork":false,"pushed_at":"2026-04-16T08:50:34.000Z","size":14802,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-23T03:34:05.541Z","etag":null,"topics":["cpp","cuda","k-means-clustering","mpi","multithreading","opencv","rcc","real-time-stream-processing"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sebsop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T09:50:05.000Z","updated_at":"2026-04-16T08:50:38.000Z","dependencies_parsed_at":"2025-10-02T16:32:54.194Z","dependency_job_id":null,"html_url":"https://github.com/sebsop/realtime-parallel-kmeans-segmentation","commit_stats":null,"previous_names":["dosqas/realtime-parallel-kmeans-segmentation","sebsop/realtime-parallel-kmeans-segmentation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sebsop/realtime-parallel-kmeans-segmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sebsop%2Frealtime-parallel-kmeans-segmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sebsop%2Frealtime-parallel-kmeans-segmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sebsop%2Frealtime-parallel-kmeans-segmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sebsop%2Frealtime-parallel-kmeans-segmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sebsop","download_url":"https://codeload.github.com/sebsop/realtime-parallel-kmeans-segmentation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sebsop%2Frealtime-parallel-kmeans-segmentation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34674702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cuda","k-means-clustering","mpi","multithreading","opencv","rcc","real-time-stream-processing"],"created_at":"2026-06-23T03:34:06.399Z","updated_at":"2026-06-23T03:34:07.222Z","avatar_url":"https://github.com/sebsop.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎨 Real-Time Parallel K-Means Image Segmentation\n\nA high-performance computer vision system that performs real-time image segmentation using K-Means clustering with multiple parallel backends for optimal performance across different hardware configurations.\n\n![Project demo](./demo/project_demo.gif)\n*Real-time parallel K-means clustering: original vs segmented frames with dynamic K-value control via slider*\n\n---\n\n## ✨ Key Features\n\n- **🚀 Real-Time Performance**: Up to 55+ FPS with CUDA backend on live webcam feeds\n- **🔧 Multiple Parallel Backends**: Sequential, Multi-threaded, MPI + OpenMP, and CUDA implementations\n- **🌳 RCC Tree Optimization**: Recursive Cached Coreset tree for efficient streaming segmentation\n- **🎯 5D Feature Space**: Combines color (BGR) and spatial (x,y) features for coherent segmentation\n- **⚡ Dynamic Backend Switching**: Switch between backends in real-time with keyboard shortcuts\n- **📊 Performance Monitoring**: Live FPS tracking with min/max statistics\n- **🖼️ Interactive Controls**: Adjustable K-value slider and side-by-side visualization\n\n---\n\n## 🧠 Technical Overview\n\nThis project implements an advanced K-Means clustering system optimized for real-time image segmentation with multiple parallelization strategies:\n\n- **Core Algorithm**: K-Means clustering adapted for image segmentation\n- **Feature Engineering**: 5D vectors combining color similarity and spatial proximity\n- **Coreset Sampling**: Reduces computational complexity from O(n·k·t) to O(s·k·t), where n = total pixels, s = coreset size (s ≪ n)\n- **RCC Tree Structure**: Maintains $(1 \\pm \\epsilon)$-approximation with bounded memory\n- **Hardware Optimization**: Leverages multi-core CPUs, distributed systems, and GPUs\n\n---\n\n## 📊 Performance Benchmarks\n\n### FPS Performance by Backend and K-value\n\n| Backend | K=2 (Min FPS) | K=2 (Max FPS) | K=12 (Min FPS) | K=12 (Max FPS) | Performance Ratio |\n|---------|---------------|---------------|----------------|----------------|-------------------|\n| **Sequential** | 15 | 17 | 5 | 6 | 1.0× (baseline) |\n| **Multi-threaded** | 14 | 44 | 10 | 22 | 2.4× average |\n| **MPI** | 17 | 44 | 13 | 21 | 2.6× average |\n| **CUDA** | 14 | 55 | 15 | 44 | 3.2× average |\n\n### Performance Characteristics\n\n```\nPerformance Improvement Factor (vs Sequential):\n┌─────────────────────────────────────────────────┐\n│ CUDA:          ████████████████ 3.2×            │\n│ MPI:           ██████████████ 2.6×              │\n│ Multi-thread:  ████████████ 2.4×                │\n│ Sequential:    ████ 1.0× (baseline)             │\n└─────────────────────────────────────────────────┘\n```\n\n---\n\n## 🏗️ Architecture Details\n\n### Algorithmic Complexity\n\n| Component | Complexity | Notes |\n|-----------|------------|-------|\n| **Sequential K-Means** | $O(n \\cdot k \\cdot t)$ | Baseline implementation |\n| **Coreset K-Means** | $O(s \\cdot k \\cdot t)$ | Reduced complexity with $s \\ll n$ |\n| **RCC Tree Insertion** | $O(s \\cdot \\log N)$ | Streaming update per frame |\n| **RCC Tree Merging** | $O(s)$ | Weighted merge operation |\n| **GPU Assignment** | $O(n / \\text{cores})$ | Massive parallelization |\n\n### Backend-Specific Implementations\n\n#### 🔄 Multi-threaded (std::thread)\n- **Strategy**: Row-based work distribution across CPU threads\n- **Synchronization**: Lock-free design with barrier synchronization\n- **Memory Safety**: Read-only sharing with exclusive write regions\n- **Best For**: Multi-core desktop systems\n\n#### 🌐 Distributed (MPI + OpenMP)\n- **Strategy**: Master-Worker pattern combining process-level (MPI) distribution of rows with thread-level (OpenMP) parallelization within each process.\n- **Communication**: Uses MPI_Bcast to distribute cluster centers, frame data, and control parameters (k, dimensions, stop flag) and MPI_Gatherv for efficient, variable-sized result aggregation.\n- **Hybrid Approach**: MPI collective calls act as synchronization barriers, ensuring data consistency across the network.\n- **Best For**: HPC clusters and large distributed systems\n\n#### 🎮 GPU-accelerated (CUDA)\n- **Strategy**: One thread per pixel for maximum throughput\n- **Memory Management**: Efficient host-device transfers\n- **Synchronization**: cudaDeviceSynchronize() for completion barriers\n- **Best For**: High-throughput applications with GPU hardware\n\n---\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n```bash\n# System Requirements\n- C++17 compatible compiler (GCC/MSVC/Clang)\n- CMake 3.18+\n- OpenCV 4.0+\n- CUDA Toolkit (for GPU backend)\n- MPI implementation (e.g., OpenMPI/MS-MPI for distributed backend)\n```\n\n### Building the Project\n\n```bash\n# Clone the repository\ngit clone https://github.com/sebsop/realtime-parallel-kmeans-segmentation.git\ncd realtime-parallel-kmeans-segmentation\n\n# Create build directory\nmkdir build \u0026\u0026 cd build\n\n# Configure with CMake\ncmake ..\n\n# Build the project\ncmake --build . --config Release\n\n# Change to the output directory\ncd out/build/x64-Debug\n\n# Run the application ()\nmpiexec -n \u003cnumber_of_processes\u003e realtime_parallel_kmeans_segmentation.exe\n```\n\n### Runtime Controls\n\n| Key | Action |\n|-----|--------|\n| **'1'** | Switch to Sequential backend |\n| **'2'** | Switch to Multi-threaded backend |\n| **'3'** | Switch to MPI backend |\n| **'4'** | Switch to CUDA backend |\n| **ESC** | Exit application |\n| **K Slider** | Adjust cluster count (2-12) |\n\n---\n\n## 🔧 Configuration\n\n### Algorithm Parameters\n\n```cpp\n// Default configuration values\nconst int k_min = 1;           // Minimum clusters\nconst int k_max = 12;          // Maximum clusters\nconst int sample_size = 2000;  // Coreset size\nconst float color_scale = 1.0f;   // Color feature scaling\nconst float spatial_scale = 0.5f; // Spatial feature scaling\n```\n\n### RCC Tree Settings\n\n```cpp\nconst int max_levels = 8;      // Maximum tree height\nconst int default_sample = 2000; // Default coreset size\n```\n\n---\n\n## 📈 Real-Time Performance Thresholds\n\n| FPS Range | Classification | User Experience | Recommended Backends |\n|-----------|----------------|-----------------|----------------------|\n| **30+ FPS** | Excellent | Smooth real-time | CUDA, Multi-threaded, MPI Hybrid (K≤8) |\n| **20-30 FPS** | Good | Acceptable real-time | Multi-threaded, MPI (K≤6) |\n| **10-20 FPS** | Fair | Noticeable lag | All backends (K≤4) |\n| **\u003c10 FPS** | Poor | Choppy playback | Sequential only (high K) |\n\n---\n\n## 🎯 Use Cases\n\n### 🎬 Video Processing\n- Real-time video segmentation for streaming applications\n- Live broadcast effects and background replacement\n- Content creation and video editing workflows\n\n### 🤖 Computer Vision Research\n- Baseline implementation for segmentation algorithms\n- Performance benchmarking across different hardware\n- Educational demonstrations of parallel computing concepts\n\n### 🏥 Medical Imaging\n- Real-time analysis of medical imagery\n- Interactive segmentation for diagnostic applications\n- High-throughput batch processing of medical data\n\n### 🎮 Interactive Applications\n- Real-time augmented reality applications\n- Interactive art installations\n- Gaming and entertainment systems\n\n---\n\n## 🛠️ Customization\n\n### Adding New Backends\n\n```cpp\n// In clustering_backends.hpp\nenum Backend { \n    BACKEND_SEQ = 0, \n    BACKEND_CUDA = 1, \n    BACKEND_THR = 2, \n    BACKEND_MPI = 3,\n    BACKEND_CUSTOM = 4  // Your custom backend\n};\n\n// Implement your backend function\ncv::Mat segmentFrameWithKMeans_custom(\n    const cv::Mat\u0026 frame, int k, int sample_size,\n    float color_scale, float spatial_scale);\n```\n\n### Tuning Performance\n\n```cpp\n// Adjust coreset sampling for speed vs quality trade-off\nconst int fast_sample_size = 1000;    // Faster, lower quality\nconst int quality_sample_size = 5000; // Slower, higher quality\n\n// Modify feature scaling for different segmentation characteristics\nconst float color_emphasis = 2.0f;    // Emphasize color similarity\nconst float spatial_emphasis = 0.1f;  // De-emphasize spatial proximity\n```\n\n---\n\n## 📂 Project Structure\n\n```\nrealtime-parallel-kmeans-segmentation/\n├── 📁 include/                      # Header files\n│   ├── clustering.hpp               # Main clustering interface\n│   ├── clustering_backends.hpp      # Backend implementations\n│   ├── coreset.hpp                  # Coreset data structures\n│   ├── rcc.hpp                      # RCC tree implementation\n│   ├── utils.hpp                    # Utility functions\n│   └── video_io.hpp                 # Video I/O interface\n├── 📁 src/                          # Source files\n│   ├── 📁 clustering/               # Backend implementations\n│   │   ├── clustering_cuda.cu       # CUDA GPU backend\n│   │   ├── clustering_entry.cpp     # Backend dispatcher\n│   │   ├── clustering_mpi.cpp       # MPI distributed backend\n│   │   ├── clustering_seq.cpp       # Sequential CPU backend\n│   │   └── clustering_thr.cpp       # Multi-threaded backend\n│   ├── coreset.cpp                  # Coreset algorithms\n│   ├── main.cpp                     # Application entry point\n│   ├── rcc.cpp                      # RCC tree implementation\n│   ├── utils.cpp                    # Utility functions\n│   └── video_io.cpp                 # Video I/O implementation\n├── 📁 docs/                         # Documentation\n│   ├── project__demo.gif            # Program demonstration GIF\n├── 📁 docs/                         # Documentation\n│   ├── algorithms.md                # Algorithm descriptions\n│   ├── parallelization.md           # Synchronization details\n│   └── performance.md               # Performance analysis\n├── 📁 tests/                        # Test files\n│   ├── test_clustering.cpp          # Clustering tests\n│   ├── test_coreset.cpp             # Coreset tests\n│   ├── test_rcc_.cpp                # RCC tree tests\n│   ├── test_utils.cpp               # Utility tests\n│   └── test_video_io_.cpp           # Video I/O tests\n├── CMakeLists.txt                   # Build configuration\n├── LICENSE                          # MIT License\n└── README.md                        # This file\n```\n\n---\n\n## 🔬 Technical Deep Dive\n\n### Recursive Cached Coreset (RCC) Tree\n\nThe RCC tree enables efficient streaming K-means by:\n\n1. **Leaf Insertion**: New frame coresets inserted with carry propagation\n2. **Node Merging**: Weighted coreset combination with bounded size\n3. **Root Computation**: Dynamic merging of all levels for comprehensive representation\n4. **Memory Bounds**: Tree height limited to prevent unbounded growth\n\n### Synchronization Strategies\n\n- **Multi-threaded**: Lock-free design with const references and exclusive write regions\n- **MPI**: Collective operations (MPI_Bcast, MPI_Gatherv) with hybrid OpenMP parallelization\n- **CUDA**: Host-device synchronization with cudaDeviceSynchronize() barriers\n\n---\n\n## 🧪 Known Limitations\n\n1. **Memory Requirements**: CUDA backend requires sufficient GPU memory for large images\n2. **Network Dependency**: MPI performance varies with network latency and bandwidth\n3. **K-value Scaling**: All backends show performance degradation with very high cluster counts\n4. **Hardware Specific**: Optimal performance depends on specific hardware configuration\n\n---\n\n## 🔮 Possible Future Enhancements\n\n- [ ] **Adaptive Coreset Sizing**: Dynamic adjustment based on image complexity\n- [ ] **Additional Color Spaces**: Support for HSV, LAB, and other color representations\n- [ ] **Temporal Coherence**: Frame-to-frame consistency improvements\n- [ ] **Mobile Optimization**: ARM NEON and mobile GPU backend support\n- [ ] **Cloud Integration**: Distributed processing across cloud instances\n\n---\n\n## 🙏 Acknowledgments\n\n- **[OpenCV Team](https://opencv.org/)** – For comprehensive computer vision library and excellent documentation\n- **[NVIDIA CUDA](https://developer.nvidia.com/cuda-toolkit)** – For GPU computing platform and development tools\n- **[Open MPI Project](https://www.open-mpi.org/)** – For high-performance message passing interface\n- **[CMake Community](https://cmake.org/)** – For cross-platform build system\n- **Research Community** – For foundational work on coreset algorithms and RCC trees\n\n### Key References\n\n- Feldman, D., Schmidt, M., \u0026 Sohler, C. (2013). *Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering*\n- Bachem, O., Lucic, M., \u0026 Krause, A. (2017). *Practical coreset constructions for machine learning*\n\n---\n\n## 📄 License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## 💡 Contact\n\nQuestions, feedback, or ideas? Reach out anytime at [sebastian.soptelea@proton.me](mailto:sebastian.soptelea@proton.me).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsebsop%2Frealtime-parallel-kmeans-segmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsebsop%2Frealtime-parallel-kmeans-segmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsebsop%2Frealtime-parallel-kmeans-segmentation/lists"}