https://github.com/haseeb-heaven/gemini-vision-pro

Google Gemini Vision Web application with Speech and Text
https://github.com/haseeb-heaven/gemini-vision-pro

bard chatgpt cv2-library gemini gemini-ai gemini-api google google-ai google-bard google-cloud google-gemini google-studio gpt image-processing opencv python python-cv2 python-image-processing

Last synced: 16 days ago
JSON representation

Google Gemini Vision Web application with Speech and Text

Host: GitHub
URL: https://github.com/haseeb-heaven/gemini-vision-pro
Owner: haseeb-heaven
License: mit
Created: 2023-12-16T23:24:46.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-01-23T04:48:43.000Z (over 1 year ago)
Last Synced: 2025-04-10T08:59:50.780Z (16 days ago)
Topics: bard, chatgpt, cv2-library, gemini, gemini-ai, gemini-api, google, google-ai, google-bard, google-cloud, google-gemini, google-studio, gpt, image-processing, opencv, python, python-cv2, python-image-processing
Language: Python
Homepage: https://gemini-studio.streamlit.app
Size: 63.5 KB
Stars: 45
Watchers: 2
Forks: 16
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Gemini Vision Pro Logo

🚀 **_Description:_** 🚀 This is the amazing Google Gemini Vision Pro 📸, a powerful tool that scans images, generates descriptions using Gemini AI Pro Vision API, and provides speech feedback 🗣️. It also captures images using the webcam 🖥️.

## 🌟 **Introduction** 🌟

Google Gemini Vision Pro is a versatile application that combines image processing 🖼️, speech recognition 🎤, and text-to-speech capabilities 📢. With this application, you can capture images using your webcam 📷, convert spoken words to text 📝, generate image descriptions 📚, and even have the descriptions spoken back to you 📣.

## **Installation Guide**

### **_Step 1: Clone the repository_**

```bash
git clone https://github.com/haseeb-heaven/Gemini-Vision-Pro
cd Gemini-Vision-Pro
```

## **_Step 2: Install the dependencies_**

```bash
pip install -r requirements.txt
```

## **_Step 3: Run the application_**

```bash
streamlit run script.py
```

## **_Step 4: Obtain the Google Palm API key and Setup the application_**

1. **_Obtain the Google Palm API key._**
2. Visit the following URL: [Google AI Studio](https://makersuite.google.com/app/apikey)
3. Click on the **_Create API Key_** button.
4. The generated key is your API key. Please make sure to **_copy it and paste it in the application settings_**.
5. The API key is crucial for the functioning, **_Please ensure to keep it safe and do not share it with anyone_**.

### Gemini AI settings:
Gemini Settings

## **AI Sections**

The core AI sections of this project include:
- 📷 **Webcam detection** using WebRTC, OpenCV, and PIL
- 🗣️ **Speech-to-text conversion** using Google Cloud Speech-to-Text API
- 🎙️ **Text-to-speech conversion** using Google Cloud Text-to-Speech API
- 📸 **Image processing** using Gemini AI Pro Vision API

## **Features**

- 📷 Webcam detection with real-time image capture
- 🗣️ Speech-to-text conversion for spoken words
- 🎙️ Text-to-speech for generating spoken descriptions
- 📸 Image processing using AI to provide detailed descriptions
- 📝 Logging using Python's logging module
- ⚙️ Error handling with Python's exception handling

## **WebUI - Application Showcase**

### YouTube demo:
[![Gemini Vision demo](https://i.ibb.co/0c0kzXY/gemini-youtube-img.png)](https://www.youtube.com/shorts/DkwDIDc0ufI)

### Webcam with live feed:

Webcam with live feed

### Gemini Ai Vision demo with object as Cap:

Gemini Ai Vision Cap

### Gemini Ai Vision demo with Hand:

Gemini Ai Vision Hand

### Gemini Ai Vision demo with Gesture:

Gemini Ai Vision Gesture

## **Packages Used**

This project relies on various Python packages, including:
- Streamlit - A web app framework used to build the application
- Streamlit Webrtc - Used for capturing images from the webcam
- OpenCV - Utilized for webcam image capture
- PIL (Pillow) - Used for image processing and conversion
- gTTS (Google Text-to-Speech) - Converts text to speech
- SpeechRecognition - Converts speech to text
- google.cloud.speech - Part of Google Cloud services for speech-to-text conversion

## 📚 **Links and References**
Follow these links for **Google Gemini Vision Pro** related content:
- [**Google AI Studio**](https://makersuite.google.com/app/apikey)
- [**Google Gemini Vision Pro**](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini)
- [**Google Gemini Deepmind**](https://deepmind.google/technologies/gemini/)

## **Versioning**

- **Version**: 1.0 : Initial Release

## **Contributing**

We welcome contributions! Please follow our [**Contribution Guidelines**](CONTRIBUTING.md) to get started.

## **License**

This project is licensed under the **MIT License** - see the [**LICENSE**](LICENSE) file for details.

## **Author**

- **_HeavenHM_**
- **_Date:_** 17-12-2023

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/haseeb-heaven/gemini-vision-pro

Awesome Lists containing this project

README