{"id":47808853,"url":"https://github.com/machinefi/visioninfer","last_synced_at":"2026-04-03T18:01:09.884Z","repository":{"id":344433884,"uuid":"1181762653","full_name":"machinefi/VisionInfer","owner":"machinefi","description":"Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.","archived":false,"fork":false,"pushed_at":"2026-03-27T05:25:28.000Z","size":93,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-27T17:43:31.897Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/machinefi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-14T15:37:51.000Z","updated_at":"2026-03-27T05:25:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/machinefi/VisionInfer","commit_stats":null,"previous_names":["iloveyou-github/visioninfer","machinefi/visioninfer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/machinefi/VisionInfer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinefi%2FVisionInfer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinefi%2FVisionInfer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinefi%2FVisionInfer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinefi%2FVisionInfer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/machinefi","download_url":"https://codeload.github.com/machinefi/VisionInfer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinefi%2FVisionInfer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31368156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-03T18:00:30.097Z","updated_at":"2026-04-03T18:01:09.831Z","avatar_url":"https://github.com/machinefi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VisionInfer\nLightweight Visual Language Model (VLM) Inference Tool optimized for **Jetson Edge Devices** and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.\n\n## Features\n- 🎥 Multi-source support: USB cameras, RTSP streams, VOD files, live network streams\n- 🚀 Motion-gated inference (only run inference when motion detected)\n- 🎯 Frame deduplication (skip similar frames via L2 feature comparison)\n- 📊 Real-time performance monitoring (encoding/inference time, frame metrics)\n- 🔧 Jetson-optimized: Tailored for ARM64 architecture and limited edge resources\n- 🎛️ Configurable parameters: Compression quality, inference interval, motion threshold\n- 🪵 Debug mode for troubleshooting (--debug flag)\n\n\n\n## Requirements\n\n### General Requirements\n- Python 3.8+\n- OpenCV (cv2)\n- NumPy\n- psutil\n- Ollama (v0.1.40+) [Optional]\n- YOLO [Optional]\n- FFmpeg (for frame extraction from streams/files)\n\n### Jetson-Specific Requirements\n- Jetson Nano/Xavier NX/Orin (JetPack 6.0+)\n- Minimum 8GB RAM \n\n\n\n## Installation\n\n### Install Dependencies Script Usage\nOur `install_deps.sh` script supports flexible dependency installation with optional Ollama backend, and is compatible with both `sh` (dash) and `bash` on Ubuntu/Jetson systems.\n\n#### Basic Usage\n| Scenario                          | Command                                                                 |\n|-----------------------------------|--------------------------------------------------------------------------|\n| Install only core dependencies (ffmpeg, python3-pip, pipx) | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \\| sh` |\n| Install core dependencies + Ollama backend | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \\| sh -s -- --backend ollama` |\n| Show script help (check parameters) | `curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \\| sh -s -- --help` |\n\n#### Compatibility Note\n- For better compatibility (especially on Jetson), you can replace `sh` with `bash` (recommended):\n  ```bash\n  # Install core dependencies + Ollama (bash execution)\n  curl -fsSL https://raw.githubusercontent.com/iloveyou-github/VisionInfer/main/install_deps.sh | bash -s -- --backend ollama\n\n\n\n### Install VisionInfer\n\n#### For Jetson (Pre-installed System OpenCV with CUDA)\nTo avoid breaking system dependencies (e.g., JetPack's pre-built OpenCV), use --system-site-packages to reuse the system's OpenCV:\n\n- If you **do not** plan to use YOLO models in the future, we recommend installing using the following command.\n\n```bash\npipx install --system-site-packages vinfer\n```\n- If you plan to use YOLO models in the future, we strongly recommend installing them with the following command.\n\n```\npip install --system-site-packages vinfer\npip install ultralytics --no-deps\npip install matplotlib pillow polars psutil pyyaml requests scipy ultralytics-thop\n```\n\n\n\n#### For Other Systems (No Special OpenCV)\n\nInstall with full dependencies (includes OpenCV) if your system doesn't have a pre-configured OpenCV:\n```bash\npipx install vinfer[full]\n```\n\n\n\n### Jetson Resource Configuration \n\n#### Increase Swap Space [Optional]\n```bash\n# Create 4GB swap file\nsudo fallocate -l 4G /swapfile\nsudo chmod 600 /swapfile\nsudo mkswap /swapfile\nsudo swapon /swapfile\n\n# Make swap permanent (survive reboot) [Optional]\necho '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab\n```\n\n#### Configure GPU Memory (Jetson Orin/Nano)\n```bash\n# For Jetson Orin (set 16GB GPU memory)\nsudo nvpmodel -m 0\nsudo jetson_clocks\n\n# For Jetson Nano (set max performance mode)\nsudo nvpmodel -m 0\nsudo jetson_clocks\n```\n\n#### Pull Optimized Model (Jetson)\n```bash\n# Recommended lightweight model for Jetson\nollama pull qwen3.5:2b\n```\n\n\n\n## Quick Start\n\n### USB Camera Inference\n```bash\n# Basic USB camera (device ID 0) with debug logs\nvinfer cam --usb-dev 0 --debug\n\n# USB camera with motion detection (infer only on motion)\nvinfer cam --usb-dev 0 --motion-gate --motion-threshold 500\n\n# USB camera with frame deduplication (skip similar frames)\nvinfer cam --usb-dev 0 --dedup --interval 2.0\n\n# Basic USB camera (device ID 0) with YOLO\nvinfer cam --usb-dev 0 --model \"yolo\"\n\n# Basic USB camera (device ID 0) with YOLO26 in Detection task \nvinfer cam --usb-dev 0 --model \"yolo\" --yolo-version 26 --yolo-task \"detection\"\n```\n\n### RTSP Camera Inference\n```bash\n# Basic RTSP stream (default credentials)\nvinfer cam --rtsp-host 192.168.1.10 --rtsp-user admin --rtsp-pass password --debug\n\n# RTSP with custom compression (320x240) and JPG quality (80)\nvinfer --rtsp-host 192.168.1.10 --compress-size 320x240 --jpg-quality 80\n\n# Simple RTSP stream (default credentials) with YOLO\nvinfer -H 192.168.1.10 -m \"yolo\" \n\n# Simple RTSP stream (default credentials) with YOLO11 in Pose task\nvinfer -H 192.168.1.10 -m \"yolo\" -yv 11 -yt \"pose\"\n```\n\n### VOD (Video File) Analysis\n```bash\n# Local video file (analyze every 30 frames)\nvinfer analyze --type vod --file /path/to/video.mp4 --start 0 --step 30\n\n# Network VOD URL (e.g., MP4 stream)\nvinfer analyze --type vod --url https://example.com/video.mp4 --debug\n```\n\n### Live Stream Analysis\n```bash\n# HLS live stream (e.g., .m3u8)\nvinfer analyze --type live --url https://example.com/stream.m3u8 --interval 1.0\n```\n\n\n\n## Command Reference\n\n### Core Subcommands\n| Subcommand | Description |\n|------------|-------------|\n| `cam`      | Real-time camera inference (USB/RTSP) |\n| `analyze`  | Offline video/live stream analysis |\n\n### Common Arguments\n| Argument | Short | Description | Default |\n|----------|-------|-------------|---------|\n| `--model` | `-m`  | Ollama model name or YOLO | `qwen3.5:2b` |\n| `--compress-size` | `-s` | Frame compression resolution (WxH) | `480x360` |\n| `--jpg-quality` | `-q` | JPG compression quality (0-100) | `70` |\n| `--motion-gate` | `-g` | Enable motion detection (infer only on motion) | `False` |\n| `--motion-threshold` | `-T` | Minimum motion area (pixels) | `500` |\n| `--dedup` | `-D` | Enable frame deduplication (disabled if motion-gate is on) | `False` |\n| `--interval` | `-i` | Inference interval (seconds/frame) | `1.0` |\n| `--debug` | `-d` | Enable verbose debug logging | `False` |\n| `--Prompt` | `-r` | User-defined prompts ||\n| `--accelerate` | `-a` | Accelerate reasoning speed | `False` |\n| `--version` | `-v` | Show vinfer version ||\n| --yolo-version | `-yv` | Use YOLO version [8, 11, 26] |8|\n| --yolo-task | `-yt` | Use YOLO task ['detection', 'segment', 'classify', 'pose', 'obb'] |detection|\n\n### Cam Subcommand Arguments\n| Argument | Short | Description |\n|----------|-------|-------------|\n| `--rtsp-host` | `-H` | RTSP server IP/domain (enables RTSP mode) |\n| `--rtsp-user` | `-U` | RTSP authentication username | `admin` |\n| `--rtsp-pass` | `-P` | RTSP authentication password | `\"\"` |\n| `--usb-dev` | `-u` | USB camera device ID (0 = /dev/video0) | `0` |\n| `--show-preview` | `-p` | Start live preview window | `False` |\n\n### Analyze Subcommand Arguments\n| Argument | Short | Description |\n|----------|-------|-------------|\n| `--type` | `-t` | Analysis type (`vod`/`live`) | **Required** |\n| `--file` | `-f` | Local VOD file path |\n| `--url` | `-u` | Network VOD/live stream URL |\n| `--start` | `-st` | Start frame number (0-based) | `0` |\n| `--step` | `-sp` | Inference frame interval | `1` |\n\n## Troubleshooting\n### Common Issues \u0026 Solutions\n\n#### Cannot uninstall sympy\n\n- **Symptom**: Cannot uninstall Sympy 1.9\n\n- **Solution**:\n\n  ```\n  sudo apt remove python3-sympy -y\n  ```\n\n  \n\n#### numpy version conflict\n\n- **Symptom**：numpy version conflict\n\n- **Solution**: \n\n  - Install the specified version\n\n    ```\n    sudo pip3 install numpy==1.23.5\n    ```\n\n\n\n#### EOF Error During Frame Extraction\n- **Symptom**: `EOFError`/`IOError` when reading frames from RTSP/live streams\n- **Solutions**:\n  - Increase RTSP timeout: Add `-stimeout 20000000` to FFmpeg command (code already includes this)\n  - Check network stability (RTSP streams require low latency)\n  - Use TCP for RTSP: `--rtsp-transport tcp` (enabled by default in code)\n\n#### Zombie Processes (FFmpeg/Ollama)\n- **Symptom**: Orphaned FFmpeg/Ollama processes consuming resources\n- **Solutions**:\n  - The code includes `kill_all_ffmpeg()` and `stop_ollama_serve()` for cleanup\n  - Manually kill zombie processes:\n    ```bash\n    # Kill all FFmpeg processes\n    sudo pkill -f ffmpeg\n    \n    # Restart Ollama service\n    sudo systemctl restart ollama\n    ```\n\n#### Resource Exhaustion (Jetson)\n- **Symptom**: `Out of memory` errors or slow inference\n- **Solutions**:\n  - Use smaller models (qwen3.5:2b instead of 7b)\n  - Increase swap space (see Installation \u003e Jetson Configuration)\n  - Reduce frame resolution (`--compress-size 320x240`)\n  - Increase inference interval (`--interval 2.0` or higher)\n\n#### Frame Extraction Failure\n- **Symptom**: `Frame extraction failed, unable to perform inference`\n- **Solutions**:\n  - Verify RTSP URL/USB device accessibility\n  - Check FFmpeg installation (`ffmpeg -version`)\n  - For RTSP: Ensure camera is online and credentials are correct\n\n#### Continuous Inference Errors\n- **Symptom**: `Continuous inference exception: [error message]`\n- **Solutions**:\n  - Enable debug mode (`--debug`) to see detailed error logs\n  - Check Ollama service status (`sudo systemctl status ollama`)\n  - Verify model is pulled (`ollama list` to check installed models)\n\n\n\n\n\n## Known Limitations\n\n### Jetson-Specific Limitations\n- **Model Size**: Avoid 7B+ models (e.g., qwen3.5:7b) on Jetson Nano/Xavier NX—use `qwen3.5:2b` for stable performance\n- **Inference Speed**: 2B models run at ~1-2 FPS on Jetson Orin, ~0.5 FPS on Jetson Nano\n- **Preview Window**: May be slow on Jetson Nano (disable with `--no-preview` if needed)\n\n### General Limitations\n- **RTSP Latency**: RTSP streams may have 1-3s latency (normal for TCP transport)\n- **Frame Deduplication**: May skip valid frames in low-motion scenarios (adjust `DEDUP_THRESHOLD` if needed)\n- **Motion Detection**: Sensitive to lighting changes (tune `--motion-threshold` for your environment)\n\n## License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n- [YOLO]([Ultralytics | Revolutionizing the World of Computer Vision](https://www.ultralytics.com/)) for end-to-end computer vision platform\n- [Ollama](https://ollama.com/) for lightweight LLM inference\n- [OpenCV](https://opencv.org/) for computer vision processing\n- [NVIDIA Jetson](https://developer.nvidia.com/jetson) for edge AI platform support\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinefi%2Fvisioninfer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachinefi%2Fvisioninfer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinefi%2Fvisioninfer/lists"}