An open API service indexing awesome lists of open source software.

https://github.com/shaku-med/real-time-object-detect

Computer vision pipeline
https://github.com/shaku-med/real-time-object-detect

license mit-license opencv production-ready python3 status yolov4 yolov8

Last synced: about 2 months ago
JSON representation

Computer vision pipeline

Awesome Lists containing this project

README

          

# Advanced YOLO Object Detection System

[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
[![OpenCV](https://img.shields.io/badge/OpenCV-4.5+-green.svg?style=for-the-badge&logo=opencv&logoColor=white)](https://opencv.org/)
[![YOLO](https://img.shields.io/badge/YOLO-v4-orange.svg?style=for-the-badge&logo=yolo&logoColor=white)](https://github.com/AlexeyAB/darknet)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](LICENSE)
[![Status](https://img.shields.io/badge/Status-Production%20Ready-brightgreen.svg?style=for-the-badge)]()

> **Because life's too short for false positives – a YOLO detector that actually knows what it's looking at**

## What's This All About?

Ever get tired of object detection systems that think your coffee mug is a person? Or that insist your houseplant is definitely a car? Yeah, me too. This system is my attempt to build something that's not just fast, but actually *smart* about what it detects.

I've thrown in some pretty cool filtering techniques that help eliminate those "wait, that's obviously not a dog" moments. Think of it as YOLO with some common sense built in.

## The Three Flavors

I've got three different modes because, let's be honest, sometimes you just need something quick and dirty, and other times you want the full bells-and-whistles experience:

| Mode | What It Does | Speed | How Good It Is |
|------|-------------|-------|----------------|
| **Basic** | Your standard YOLO v4 – gets the job done | Lightning fast ⚡ | Pretty decent |
| **Advanced** | Adds some smart filtering magic | Still pretty quick | Much better |
| **Enhanced** | The whole shebang – all the fancy stuff | Takes its time | Chef's kiss 👌 |

## The Cool Stuff Under the Hood

Here's what makes this thing tick (warning: some of this might sound like I'm showing off, but I promise it's all useful):

- **Temporal Filtering**: Keeps track of what it's seen before – no more flickering detections
- **Motion Detection**: Only bothers looking at stuff that's actually moving (revolutionary, I know)
- **Ensemble Detection**: Runs multiple confidence checks because two heads are better than one
- **Multi-Scale Detection**: Looks at things from different angles – like putting on your reading glasses
- **Adaptive Thresholds**: Gets smarter over time (unlike me with my morning coffee)
- **Frame Quality Analysis**: Won't waste time on blurry garbage frames
- **Advanced NMS**: Fancy way of saying "don't detect the same thing twice"
- **Confidence Calibration**: Statistical mumbo-jumbo that actually works

## Getting Started (The Easy Way)

### What You'll Need

- Python 3.8+ (if you're still on Python 2, we need to talk)
- A webcam (or any video source that doesn't hate you)
- At least 4GB of RAM (8GB if you want the fancy enhanced mode)
- A computer that was made after 2010

### Installation

**Step 1: Grab the code**
```bash
git clone https://github.com/Shaku-Med/Real-Time-Object-Detect.git
cd advanced-yolo-detection
```

**Step 2: Install the dependencies**
```bash
pip install -r requirements.txt
```
*Grab a coffee while this runs. Trust me.*

**Step 3: Just run it**
```bash
python main.py
```
*It'll download what it needs automatically. I'm not a monster.*

## How to Use This Thing

### The "I Just Want It to Work" Approach
```bash
python main.py
```
This runs the enhanced mode with all the good stuff turned on. It's like the "I'm feeling lucky" button but for object detection.

### The "Let Me Choose My Own Adventure" Approach
```bash
python launcher.py
```
This gives you a nice menu where you can pick your poison.

### The "I Want to Code It Myself" Approach

**Basic Mode (for when you're in a hurry):**
```python
from detection_pipeline import LiveDetectionPipeline

pipeline = LiveDetectionPipeline(
weights_path="yolov4.weights",
config_path="yolov4.cfg",
classes_path="coco.names"
)
pipeline.run(camera_index=0)
```

**Advanced Mode (when you want something better):**
```python
from advanced_detection import AdvancedDetectionPipeline

pipeline = AdvancedDetectionPipeline(
weights_path="yolov4.weights",
config_path="yolov4.cfg",
classes_path="coco.names"
)
pipeline.run_advanced(camera_index=0)
```

**Enhanced Mode (when you want the full experience):**
```python
from enhanced_detection import EnhancedDetectionPipeline

pipeline = EnhancedDetectionPipeline(
weights_path="yolov4.weights",
config_path="yolov4.cfg",
classes_path="coco.names"
)
pipeline.run_enhanced(camera_index=0)
```

### Keyboard Shortcuts (Because We're Not Animals)
- **ESC** or **Q**: Peace out
- **R**: Start over (like Ctrl+Z for your detection pipeline)
- **S**: Save the current frame (for posterity)
- **Space**: Pause/Resume (for dramatic effect)

## Tweaking the Settings

Want to mess with the settings? Check out `config.py` – it's where all the magic numbers live:

```python
class Config:
# How confident should we be before yelling "I found something!"
CONFIDENCE_THRESHOLD = 0.6
NMS_THRESHOLD = 0.3

# Camera stuff
CAMERA_INDEX = 0
FRAME_WIDTH = 640
FRAME_HEIGHT = 480
FRAME_FPS = 30

# The fancy filtering stuff
TEMPORAL_FILTER_HISTORY = 15
TEMPORAL_STABILITY_THRESHOLD = 0.4
MOTION_THRESHOLD = 0.005
ENSEMBLE_THRESHOLDS = [0.3, 0.5, 0.7]
```

## How Everything Fits Together

The basic structure is pretty straightforward:

```
Advanced YOLO Detection System
├── main.py # The main event
├── launcher.py # For when you want options
├── config.py # All the knobs and dials
├── utils.py # Random useful stuff
├── detection_pipeline.py # Basic YOLO magic
├── advanced_detection.py # Smarter YOLO magic
├── enhanced_detection.py # The full monty
├── requirements.txt # What you need to install
└── README.md # You are here
```

### The Detection Pipeline (Or: How the Sausage Gets Made)

```
Your Camera Feed

Multi-Scale Detection (looking at things from different angles)

Temporal Filtering (remembering what we saw before)

Motion Analysis (ignoring boring static stuff)

Ensemble Detection (getting multiple opinions)

Adaptive Thresholding (getting smarter over time)

Quality Analysis (not wasting time on garbage frames)

Advanced NMS (avoiding double-counting)

Confidence Calibration (final sanity check)

Your Beautifully Detected Objects
```

## Performance (The Numbers Game)

### How Much Better Is It?

| What We're Measuring | Basic | Advanced | Enhanced |
|---------------------|-------|----------|----------|
| **False Positives** | ~15% (meh) | ~8% (better) | ~3% (chef's kiss) |
| **How Stable** | Okay | Pretty good | Rock solid |
| **Speed** | 30 FPS | 25 FPS | 20 FPS |
| **Memory Usage** | Minimal | Reasonable | Hungry |

### What Your Computer Needs

| Component | Bare Minimum | What I'd Recommend |
|-----------|--------------|-------------------|
| **CPU** | Intel i5 / AMD Ryzen 5 | Intel i7 / AMD Ryzen 7 |
| **RAM** | 4GB (if you like living dangerously) | 8GB+ (for a good time) |
| **GPU** | Whatever you've got | NVIDIA GTX 1060+ |
| **Storage** | 2GB | 5GB+ (for all the models) |

## The Technical Stuff (For the Curious)

### Main Classes

**EnhancedDetectionPipeline** - The star of the show:

```python
class EnhancedDetectionPipeline:
def __init__(self, weights_path: str, config_path: str,
classes_path: str, confidence_threshold: float = 0.6,
nms_threshold: float = 0.3)

def run_enhanced(self, camera_index: int = 0,
window_name: str = "Enhanced Detection")

def detect_objects_enhanced(self, frame: np.ndarray) -> List[Detection]
```

**Detection** - What you get back:

```python
@dataclass
class Detection:
bbox: List[int] # Where the thing is [x, y, width, height]
confidence: float # How sure we are (0-1)
class_id: int # What number the thing is
class_name: str # What we call the thing
timestamp: float # When we found it
```

### The Supporting Cast

The system has a bunch of helper classes that do the heavy lifting:

- **MultiScaleDetector**: Looks at things from different zoom levels
- **AdaptiveThresholdManager**: Adjusts standards based on what it's seeing
- **FrameQualityAnalyzer**: Decides if a frame is worth processing
- **TemporalFilter**: Remembers what happened before
- **MotionDetector**: Spots the moving stuff
- **EnsembleDetector**: Gets multiple opinions before deciding

## Want to Contribute?

I'd love some help making this even better! Here's how to get started:

```bash
git clone https://github.com/Shaku-Med/Real-Time-Object-Detect.git
cd advanced-yolo-detection
pip install -r requirements.txt
python -m pytest tests/
```

Just keep it clean, use type hints (your future self will thank you), and write some tests. I'm not picky about much else.

## License

MIT License – do whatever you want with it, just don't blame me if your robot uprising uses this code.

## Shoutouts

- The YOLO v4 folks for making object detection not terrible
- The OpenCV team for doing the heavy lifting on computer vision
- The COCO dataset people for giving us something to detect
- My coffee maker for keeping me functional during development

---

*P.S. - If you find any bugs, please let me know. I promise I'll fix them eventually (probably after I finish my current Netflix series).*