https://github.com/machinefi/visioninfer

Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.
https://github.com/machinefi/visioninfer
Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/machinefi/visioninfer
Owner: machinefi
License: mit
Created: 2026-03-14T15:37:51.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-27T05:25:28.000Z (3 months ago)
Last Synced: 2026-03-27T17:43:31.897Z (3 months ago)
Language: Python
Size: 90.8 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # VisionInfer

Lightweight Visual Language Model (VLM) Inference Tool optimized for **Jetson Edge Devices** and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.

## Features

- 🎥 Multi-source support: USB cameras, RTSP streams, VOD files, live network streams

- 🚀 Motion-gated inference (only run inference when motion detected)

- 🎯 Frame deduplication (skip similar frames via L2 feature comparison)

- 📊 Real-time performance monitoring (encoding/inference time, frame metrics)

- 🔧 Jetson-optimized: Tailored for ARM64 architecture and limited edge resources

- 🎛️ Configurable parameters: Compression quality, inference interval, motion threshold

- 🪵 Debug mode for troubleshooting (--debug flag)

## Requirements

### General Requirements

- Python 3.8+

- OpenCV (cv2)

- NumPy

- psutil

- Ollama (v0.1.40+) [Optional]

- YOLO [Optional]

- FFmpeg (for frame extraction from streams/files)

### Jetson-Specific Requirements

- Jetson Nano/Xavier NX/Orin (JetPack 6.0+)

- Minimum 8GB RAM 

## Installation

### Install Dependencies Script Usage

Our `install_deps.sh` script supports flexible dependency installation with optional Ollama backend, and is compatible with both `sh` (dash) and `bash` on Ubuntu/Jetson systems.

#### Basic Usage

| Scenario                          | Command                                                                 |

|-----------------------------------|--------------------------------------------------------------------------|

| Install only core dependencies (ffmpeg, python3-pip, pipx) | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh` |

| Install core dependencies + Ollama backend | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh -s -- --backend ollama` |

| Show script help (check parameters) | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh -s -- --help` |

#### Compatibility Note

- For better compatibility (especially on Jetson), you can replace `sh` with `bash` (recommended):

  ```bash

  # Install core dependencies + Ollama (bash execution)

  curl -fsSL https://raw.githubusercontent.com/iloveyou-github/VisionInfer/main/install_deps.sh | bash -s -- --backend ollama

### Install VisionInfer

#### For Jetson (Pre-installed System OpenCV with CUDA)

To avoid breaking system dependencies (e.g., JetPack's pre-built OpenCV), use --system-site-packages to reuse the system's OpenCV:

- If you **do not** plan to use YOLO models in the future, we recommend installing using the following command.

```bash

pipx install --system-site-packages vinfer

```

- If you plan to use YOLO models in the future, we strongly recommend installing them with the following command.

```

pip install --system-site-packages vinfer

pip install ultralytics --no-deps

pip install matplotlib pillow polars psutil pyyaml requests scipy ultralytics-thop

```

#### For Other Systems (No Special OpenCV)

Install with full dependencies (includes OpenCV) if your system doesn't have a pre-configured OpenCV:

```bash

pipx install vinfer[full]

```

### Jetson Resource Configuration 

#### Increase Swap Space [Optional]

```bash

# Create 4GB swap file

sudo fallocate -l 4G /swapfile

sudo chmod 600 /swapfile

sudo mkswap /swapfile

sudo swapon /swapfile

# Make swap permanent (survive reboot) [Optional]

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

```

#### Configure GPU Memory (Jetson Orin/Nano)

```bash

# For Jetson Orin (set 16GB GPU memory)

sudo nvpmodel -m 0

sudo jetson_clocks

# For Jetson Nano (set max performance mode)

sudo nvpmodel -m 0

sudo jetson_clocks

```

#### Pull Optimized Model (Jetson)

```bash

# Recommended lightweight model for Jetson

ollama pull qwen3.5:2b

```

## Quick Start

### USB Camera Inference

```bash

# Basic USB camera (device ID 0) with debug logs

vinfer cam --usb-dev 0 --debug

# USB camera with motion detection (infer only on motion)

vinfer cam --usb-dev 0 --motion-gate --motion-threshold 500

# USB camera with frame deduplication (skip similar frames)

vinfer cam --usb-dev 0 --dedup --interval 2.0

# Basic USB camera (device ID 0) with YOLO

vinfer cam --usb-dev 0 --model "yolo"

# Basic USB camera (device ID 0) with YOLO26 in Detection task 

vinfer cam --usb-dev 0 --model "yolo" --yolo-version 26 --yolo-task "detection"

```

### RTSP Camera Inference

```bash

# Basic RTSP stream (default credentials)

vinfer cam --rtsp-host 192.168.1.10 --rtsp-user admin --rtsp-pass password --debug

# RTSP with custom compression (320x240) and JPG quality (80)

vinfer --rtsp-host 192.168.1.10 --compress-size 320x240 --jpg-quality 80

# Simple RTSP stream (default credentials) with YOLO

vinfer -H 192.168.1.10 -m "yolo" 

# Simple RTSP stream (default credentials) with YOLO11 in Pose task

vinfer -H 192.168.1.10 -m "yolo" -yv 11 -yt "pose"

```

### VOD (Video File) Analysis

```bash

# Local video file (analyze every 30 frames)

vinfer analyze --type vod --file /path/to/video.mp4 --start 0 --step 30

# Network VOD URL (e.g., MP4 stream)

vinfer analyze --type vod --url https://example.com/video.mp4 --debug

```

### Live Stream Analysis

```bash

# HLS live stream (e.g., .m3u8)

vinfer analyze --type live --url https://example.com/stream.m3u8 --interval 1.0

```

## Command Reference

### Core Subcommands

| Subcommand | Description |

|------------|-------------|

| `cam`      | Real-time camera inference (USB/RTSP) |

| `analyze`  | Offline video/live stream analysis |

### Common Arguments

| Argument | Short | Description | Default |

|----------|-------|-------------|---------|

| `--model` | `-m`  | Ollama model name or YOLO | `qwen3.5:2b` |

| `--compress-size` | `-s` | Frame compression resolution (WxH) | `480x360` |

| `--jpg-quality` | `-q` | JPG compression quality (0-100) | `70` |

| `--motion-gate` | `-g` | Enable motion detection (infer only on motion) | `False` |

| `--motion-threshold` | `-T` | Minimum motion area (pixels) | `500` |

| `--dedup` | `-D` | Enable frame deduplication (disabled if motion-gate is on) | `False` |

| `--interval` | `-i` | Inference interval (seconds/frame) | `1.0` |

| `--debug` | `-d` | Enable verbose debug logging | `False` |

| `--Prompt` | `-r` | User-defined prompts ||

| `--accelerate` | `-a` | Accelerate reasoning speed | `False` |

| `--version` | `-v` | Show vinfer version ||

| --yolo-version | `-yv` | Use YOLO version [8, 11, 26] |8|

| --yolo-task | `-yt` | Use YOLO task ['detection', 'segment', 'classify', 'pose', 'obb'] |detection|

### Cam Subcommand Arguments

| Argument | Short | Description |

|----------|-------|-------------|

| `--rtsp-host` | `-H` | RTSP server IP/domain (enables RTSP mode) |

| `--rtsp-user` | `-U` | RTSP authentication username | `admin` |

| `--rtsp-pass` | `-P` | RTSP authentication password | `""` |

| `--usb-dev` | `-u` | USB camera device ID (0 = /dev/video0) | `0` |

| `--show-preview` | `-p` | Start live preview window | `False` |

### Analyze Subcommand Arguments

| Argument | Short | Description |

|----------|-------|-------------|

| `--type` | `-t` | Analysis type (`vod`/`live`) | **Required** |

| `--file` | `-f` | Local VOD file path |

| `--url` | `-u` | Network VOD/live stream URL |

| `--start` | `-st` | Start frame number (0-based) | `0` |

| `--step` | `-sp` | Inference frame interval | `1` |

## Troubleshooting

### Common Issues & Solutions

#### Cannot uninstall sympy

- **Symptom**: Cannot uninstall Sympy 1.9

- **Solution**:

  ```

  sudo apt remove python3-sympy -y

  ```

  

#### numpy version conflict

- **Symptom**：numpy version conflict

- **Solution**: 

  - Install the specified version

    ```

    sudo pip3 install numpy==1.23.5

    ```

#### EOF Error During Frame Extraction

- **Symptom**: `EOFError`/`IOError` when reading frames from RTSP/live streams

- **Solutions**:

  - Increase RTSP timeout: Add `-stimeout 20000000` to FFmpeg command (code already includes this)

  - Check network stability (RTSP streams require low latency)

  - Use TCP for RTSP: `--rtsp-transport tcp` (enabled by default in code)

#### Zombie Processes (FFmpeg/Ollama)

- **Symptom**: Orphaned FFmpeg/Ollama processes consuming resources

- **Solutions**:

  - The code includes `kill_all_ffmpeg()` and `stop_ollama_serve()` for cleanup

  - Manually kill zombie processes:

    ```bash

    # Kill all FFmpeg processes

    sudo pkill -f ffmpeg

    

    # Restart Ollama service

    sudo systemctl restart ollama

    ```

#### Resource Exhaustion (Jetson)

- **Symptom**: `Out of memory` errors or slow inference

- **Solutions**:

  - Use smaller models (qwen3.5:2b instead of 7b)

  - Increase swap space (see Installation > Jetson Configuration)

  - Reduce frame resolution (`--compress-size 320x240`)

  - Increase inference interval (`--interval 2.0` or higher)

#### Frame Extraction Failure

- **Symptom**: `Frame extraction failed, unable to perform inference`

- **Solutions**:

  - Verify RTSP URL/USB device accessibility

  - Check FFmpeg installation (`ffmpeg -version`)

  - For RTSP: Ensure camera is online and credentials are correct

#### Continuous Inference Errors

- **Symptom**: `Continuous inference exception: [error message]`

- **Solutions**:

  - Enable debug mode (`--debug`) to see detailed error logs

  - Check Ollama service status (`sudo systemctl status ollama`)

  - Verify model is pulled (`ollama list` to check installed models)

## Known Limitations

### Jetson-Specific Limitations

- **Model Size**: Avoid 7B+ models (e.g., qwen3.5:7b) on Jetson Nano/Xavier NX—use `qwen3.5:2b` for stable performance

- **Inference Speed**: 2B models run at ~1-2 FPS on Jetson Orin, ~0.5 FPS on Jetson Nano

- **Preview Window**: May be slow on Jetson Nano (disable with `--no-preview` if needed)

### General Limitations

- **RTSP Latency**: RTSP streams may have 1-3s latency (normal for TCP transport)

- **Frame Deduplication**: May skip valid frames in low-motion scenarios (adjust `DEDUP_THRESHOLD` if needed)

- **Motion Detection**: Sensitive to lighting changes (tune `--motion-threshold` for your environment)

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- [YOLO]([Ultralytics | Revolutionizing the World of Computer Vision](https://www.ultralytics.com/)) for end-to-end computer vision platform

- [Ollama](https://ollama.com/) for lightweight LLM inference

- [OpenCV](https://opencv.org/) for computer vision processing

- [NVIDIA Jetson](https://developer.nvidia.com/jetson) for edge AI platform support
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/machinefi/visioninfer

Awesome Lists containing this project

README