Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/haseeb-heaven/gemini-vision-pro
Google Gemini Vision Web application with Speech and Text
https://github.com/haseeb-heaven/gemini-vision-pro
bard chatgpt cv2-library gemini gemini-ai gemini-api google google-ai google-bard google-cloud google-gemini google-studio gpt image-processing opencv python python-cv2 python-image-processing
Last synced: 3 months ago
JSON representation
Google Gemini Vision Web application with Speech and Text
- Host: GitHub
- URL: https://github.com/haseeb-heaven/gemini-vision-pro
- Owner: haseeb-heaven
- License: mit
- Created: 2023-12-16T23:24:46.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-01-23T04:48:43.000Z (10 months ago)
- Last Synced: 2024-06-16T12:58:56.102Z (5 months ago)
- Topics: bard, chatgpt, cv2-library, gemini, gemini-ai, gemini-api, google, google-ai, google-bard, google-cloud, google-gemini, google-studio, gpt, image-processing, opencv, python, python-cv2, python-image-processing
- Language: Python
- Homepage: https://gemini-studio.streamlit.app
- Size: 63.5 KB
- Stars: 42
- Watchers: 3
- Forks: 16
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - haseeb-heaven/gemini-vision-pro - Google Gemini Vision Web application with Speech and Text (Python)
README
🚀 **_Description:_** 🚀 This is the amazing Google Gemini Vision Pro 📸, a powerful tool that scans images, generates descriptions using Gemini AI Pro Vision API, and provides speech feedback 🗣️. It also captures images using the webcam 🖥️.
## 🌟 **Introduction** 🌟
Google Gemini Vision Pro is a versatile application that combines image processing 🖼️, speech recognition 🎤, and text-to-speech capabilities 📢. With this application, you can capture images using your webcam 📷, convert spoken words to text 📝, generate image descriptions 📚, and even have the descriptions spoken back to you 📣.
## **Installation Guide**
### **_Step 1: Clone the repository_**
```bash
git clone https://github.com/haseeb-heaven/Gemini-Vision-Pro
cd Gemini-Vision-Pro
```## **_Step 2: Install the dependencies_**
```bash
pip install -r requirements.txt
```## **_Step 3: Run the application_**
```bash
streamlit run script.py
```## **_Step 4: Obtain the Google Palm API key and Setup the application_**
1. **_Obtain the Google Palm API key._**
2. Visit the following URL: [Google AI Studio](https://makersuite.google.com/app/apikey)
3. Click on the **_Create API Key_** button.
4. The generated key is your API key. Please make sure to **_copy it and paste it in the application settings_**.
5. The API key is crucial for the functioning, **_Please ensure to keep it safe and do not share it with anyone_**.### Gemini AI settings:
## **AI Sections**
The core AI sections of this project include:
- 📷 **Webcam detection** using WebRTC, OpenCV, and PIL
- 🗣️ **Speech-to-text conversion** using Google Cloud Speech-to-Text API
- 🎙️ **Text-to-speech conversion** using Google Cloud Text-to-Speech API
- 📸 **Image processing** using Gemini AI Pro Vision API## **Features**
- 📷 Webcam detection with real-time image capture
- 🗣️ Speech-to-text conversion for spoken words
- 🎙️ Text-to-speech for generating spoken descriptions
- 📸 Image processing using AI to provide detailed descriptions
- 📝 Logging using Python's logging module
- ⚙️ Error handling with Python's exception handling## **WebUI - Application Showcase**
### YouTube demo:
[![Gemini Vision demo](https://i.ibb.co/0c0kzXY/gemini-youtube-img.png)](https://www.youtube.com/shorts/DkwDIDc0ufI)### Webcam with live feed:
### Gemini Ai Vision demo with object as Cap:
### Gemini Ai Vision demo with Hand:
### Gemini Ai Vision demo with Gesture:
## **Packages Used**
This project relies on various Python packages, including:
- Streamlit - A web app framework used to build the application
- Streamlit Webrtc - Used for capturing images from the webcam
- OpenCV - Utilized for webcam image capture
- PIL (Pillow) - Used for image processing and conversion
- gTTS (Google Text-to-Speech) - Converts text to speech
- SpeechRecognition - Converts speech to text
- google.cloud.speech - Part of Google Cloud services for speech-to-text conversion## 📚 **Links and References**
Follow these links for **Google Gemini Vision Pro** related content:
- [**Google AI Studio**](https://makersuite.google.com/app/apikey)
- [**Google Gemini Vision Pro**](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini)
- [**Google Gemini Deepmind**](https://deepmind.google/technologies/gemini/)## **Versioning**
- **Version**: 1.0 : Initial Release
## **Contributing**
We welcome contributions! Please follow our [**Contribution Guidelines**](CONTRIBUTING.md) to get started.
## **License**
This project is licensed under the **MIT License** - see the [**LICENSE**](LICENSE) file for details.
## **Author**
- **_HeavenHM_**
- **_Date:_** 17-12-2023