https://github.com/rooshikeshbhatt/item-inspector-ai
AI-based product condition detection using BLIP-2 + FastAPI + Phi-4 (Ollama)
https://github.com/rooshikeshbhatt/item-inspector-ai
ai blip2 computer-vision condition-scoring ecommerce-ai fastapi hugging image-analysis image-tagging multimodal-ai natural-language-generation ollama open-source phi4 product-inspection prompt-engineering pyotrch python visual-language-models zero-shot-learning
Last synced: about 1 month ago
JSON representation
AI-based product condition detection using BLIP-2 + FastAPI + Phi-4 (Ollama)
- Host: GitHub
- URL: https://github.com/rooshikeshbhatt/item-inspector-ai
- Owner: rooshikeshbhatt
- License: mit
- Created: 2025-06-07T22:46:32.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-08T01:52:14.000Z (about 1 year ago)
- Last Synced: 2025-08-11T17:52:09.988Z (10 months ago)
- Topics: ai, blip2, computer-vision, condition-scoring, ecommerce-ai, fastapi, hugging, image-analysis, image-tagging, multimodal-ai, natural-language-generation, ollama, open-source, phi4, product-inspection, prompt-engineering, pyotrch, python, visual-language-models, zero-shot-learning
- Language: Python
- Homepage:
- Size: 1.38 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π΅οΈββοΈ Item-Inspector AI








**Visual AI for Product Condition Assessment & Human-like Reporting**
> Upload product images, let BLIP-2 understand the item, generate human-like condition reports with Phi-4, and enjoy the magic of zero-shot image-to-text reasoning. Also⦠there's a secret mini-game.
## π Table of Contents
- [Features](#-features)
- [Banner](#-banner)
- [Project Structure](#-project-structure)
- [Installation Guide](#-installation-guide)
- [Hardware & GPU Setup](#-hardware--gpu-setup)
- [Bonus: Tic-Tac-Toe AI](#-bonus-tic-tac-toe-ai)
- [Technology Stack](#-technology-stack)
- [GitHub Topics](#-github-topics)
- [License](#-license)
- [Contact](#-contact)
---
## β¨ Features
- **AI Product Recognition** β Detects object type: Watch, Shoe, Phone, etc.
- **Material Identification** β Metal, Leather, Glass, Suede? We got it.
- **Visual Condition Tags** β Custom per-item labels (like βscratched glassβ or βtorn strapβ).
- **Score Calculation** β Evaluates product damage level and assigns a 4β10 score.
- **Natural Language Report** β Uses Phi-4 LLM to describe condition in ~50 human-like words.
- **Frontend Upload UI** β Drag, drop, analyze.
---
## πΈ Banner

---
## π Project Structure
```text
Item-Inspector AI/
βββ backend/
β βββ app.py # This FastAPI file
β βββ requirements.txt
β βββ python_gpu_test.py # Check if TensorFlow, pytorch & numpy runs on GPU
βββ frontend/
β βββ index.html # Web UI for uploading images
βββ sample_images/
β βββ example_watch.jpg # Example test image
βββ just_for_fun/
β βββ tic_tac_toe.py # Tic-Tac-Toe AI game
βββ README.md
```
---
## π Installation Guide
### π Prerequisites
- Python 3.10+ (recommended Python 3.10.11 for GPU usage on windows)
- GitHub Desktop or Git CLI
- Ollama installed & phi4(phi4:14b-q4_K_M) model downloaded
---
### π₯ 1. Clone the Repo
git clone https://github.com/Rooshikesh/Item-Inspector-AI.git
```text
cd Item-Inspector-AI/backend
```
---
### π¦ 2. Create Virtual Environment
```text
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
```
---
### π¦ 3. Install Dependencies
pip install -r requirements.txt
---
### π§ 4. Start Ollama with Phi-4
```text
ollama run phi4:14b-q4_K_M
```
---
### π 5. Launch FastAPI
```text
uvicorn app:app --reload
```
Go to: `http://127.0.0.1:8000/docs`
---
### π 6. Use Web Interface (Optional)
Open frontend/index.html in your browser. Drag and drop product images.
---
## β‘ Hardware & GPU Setup
If you're planning to run BLIP-2 on GPU for maximum performance, keep the following in mind:
### β
Hardware Requirements
- **NVIDIA GPU** with at least **8β12GB VRAM**
- Recommended: **RTX 3060 or higher**
- **CUDA-compatible drivers** installed
- Check GPU visibility with: `nvidia-smi`
- **Python**: Version **3.10+**
### β
Python Environment for GPU
- Install PyTorch with CUDA support:
```text
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
- Our code already includes:
```text
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch_dtype=torch.float16
```
This ensures your models run on GPU if available.
### β
BLIP-2 Optimization Settings
- Make sure BLIP-2 loads with:
```text
device_map="auto", torch_dtype=torch.float16
```
- Images are correctly converted to RGB before inference:
```text
img = Image.open(file.file).convert("RGB")
```
### π§ͺ Verify GPU with Our Utility Script
Run the included [`python_gpu_test.py`](backend/python_gpu_test.py) file to confirm if TensorFlow, PyTorch, and NumPy are GPU-ready:
```text
cd backend
python python_gpu_test.py
```
This script will print the detected GPUs, framework versions, and whether each is using the GPU or CPU.
---
## π€ Bonus: Tic-Tac-Toe AI
When you need a break from debugging and BLIP-2 hallucinations:
```text
cd just_for_fun
python tic_tac_toe.py
```
* Supports easy, medium, and hard mode
* Uses Minimax algorithm in Hard mode to destroy your confidence π₯
---
## π‘ Technology Stack
* BLIP-2 (Salesforce) - Vision Language
* Phi-4 (Ollama) - Language Generation
* FastAPI - Backend Framework
* HTML/JS - Minimal Frontend
* Hugging Face Transformers
* PyTorch
---
## π·οΈ GitHub Topics
ai, blip2, phi4, fastapi, transformers, computer-vision, image-classification,
product-inspection, natural-language-generation, multimodal-ai, semantic-analysis,
ecommerce-ai, repairtech, humanlike-ai, condition-scoring, pytorch, webapi,
backend, frontend, python
---
## π License
MIT β use it, share it, modify it. Just donβt forget to smile when it works.
---
## βοΈ Contact
**Rooshikesh Bhatt**
rooshikeshbhatt@gmail.com
---