https://github.com/iprajwaal/enhanced-vision-assistant
An AI-powered vision assistant for real-time navigation and awareness.
https://github.com/iprajwaal/enhanced-vision-assistant
gemini-pro opencv vertex-ai vertex-ai-gemini-api vertexaisprint vision-api
Last synced: about 1 year ago
JSON representation
An AI-powered vision assistant for real-time navigation and awareness.
- Host: GitHub
- URL: https://github.com/iprajwaal/enhanced-vision-assistant
- Owner: iprajwaal
- License: apache-2.0
- Created: 2025-02-14T12:17:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-26T09:28:55.000Z (over 1 year ago)
- Last Synced: 2025-05-14T10:34:08.906Z (about 1 year ago)
- Topics: gemini-pro, opencv, vertex-ai, vertex-ai-gemini-api, vertexaisprint, vision-api
- Language: Jupyter Notebook
- Homepage:
- Size: 6.02 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🚀 Enhanced Vision Assistant
A **state-of-the-art** computer vision application designed to assist **visually impaired individuals** with real-time navigation and situational awareness. The system leverages cutting-edge AI technologies (**Vertex AI, Gemini Pro, Google Cloud Vision API, and Google Text-to-Speech API**) to **identify objects, evaluate risks, and deliver smart audio directions** through voice commands and natural language processing.
## 🎥 Demo Video
https://github.com/user-attachments/assets/fe943124-3d3b-4ba1-ab87-324b21469421
## ✨ Features
- **Real-time object detection** and **depth estimation**
- **Intelligent scene analysis** and **risk assessment**
- **Priority-based audio guidance system**
- **Context-aware navigation assistance**
- **Dynamic hazard detection** and **avoidance**
- **Advanced motion tracking** and **trajectory analysis**
- **Voice-activated commands and responses**
- **Natural language scene description**
- **Spatial awareness and proximity alerts**
- **Debug visualization for development purposes**
## 🛠️ Technologies Used
- **Computer Vision**: OpenCV, Google Cloud Vision API
- **AI/ML**: Google Vertex AI (**Gemini Pro**)
- **Speech Synthesis**: Google Cloud Text-to-Speech
- **Audio Processing**: Pygame
- **Additional Libraries**: NumPy, SciPy
## 📋 Requirements
- 🐍 **Python 3.7+**
- ☁️ **Google Cloud Platform account** with the following APIs enabled:
- Cloud Vision API
- Text-to-Speech API
- Vertex AI API
- 📷 **Webcam** or compatible camera device
- 🎧 **Audio output device**
## 🚀 Installation
1️⃣ **Clone the repository:**
```bash
git clone https://github.com/yourusername/enhanced-vision-assistant.git
cd enhanced-vision-assistant
```
2️⃣ **Install required packages:**
```bash
pip install opencv-python pygame google-cloud-vision google-cloud-texttospeech vertexai numpy scipy
```
3️⃣ **Set up Google Cloud credentials:**
- **Create a service account** and download the **JSON key file**
- **Set the path** to your credentials in the `CREDENTIALS_PATH` variable
- **Configure your** Google Cloud **Project ID** in the `PROJECT_ID` variable
## ⚙️ Configuration
Update the following variables in the `EnhancedVisionAssistant` class:
```python
self.PROJECT_ID = 'your-project-id'
self.REGION = 'your-region'
self.CREDENTIALS_PATH = 'path/to/your/credentials.json'
```
## 🎯 How It Works
1️⃣ **Initialize** the camera and audio systems
2️⃣ **Detect** objects in real time
3️⃣ **Analyze** the environment and provide **smart audio guidance**
4️⃣ **Display** a debug window showing detected objects and their priorities
**Press 'q' to quit the application.**