Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ZohaibAhmed/real-gemini
Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI
https://github.com/ZohaibAhmed/real-gemini
Last synced: 2 months ago
JSON representation
Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI
- Host: GitHub
- URL: https://github.com/ZohaibAhmed/real-gemini
- Owner: ZohaibAhmed
- License: mit
- Created: 2023-12-08T23:41:19.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-09T00:21:07.000Z (about 1 year ago)
- Last Synced: 2024-08-01T22:42:06.125Z (5 months ago)
- Language: Python
- Size: 8.79 KB
- Stars: 26
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Google-Gemini-AI - Gemini implemented with GPT-4 Vision - 4 Vision, Whisper, and Resemble AI. (GitHub projects)
README
Real Gemini
Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI
This project leverages the power of AI to answer questions based on visual inputs -- like Google's Gemini demo. It integrates GPT-4 Vision for image understanding, Whisper for voice recognition, and Resemble AI for voice synthesis, creating a comprehensive system capable of interpreting visual data and responding verbally.https://github.com/ZohaibAhmed/real-gemini/assets/660224/9ab3bd22-4c26-4947-9646-d2085b22725f
## Features
- **Visual Question Answering**: Uses GPT-4 Vision to interpret images from a camera feed and answer questions related to the visual content.
- **Voice Recognition**: Employs Whisper for accurate speech-to-text conversion, allowing users to ask questions verbally.
- **Voice Synthesis**: Utilizes Resemble AI for generating realistic voice responses, enhancing the interactive experience.## Prerequisites
- Python 3.x
- Camera hardware compatible with your system
- Microphone and speaker setup for voice input and output## Installation
1. **Clone the Repository**
```bash
git clone [email protected]:ZohaibAhmed/real-gemini.git
cd real-gemini
```2. **Install Dependencies**
Install the required Python packages:
```bash
pip install -r requirements.txt
```3. **Environment Setup**
- Create a `.env` file in the project root.
- Add your Resemble AI and OpenAI credentials to the `.env` file:## Usage
Run the application using the following command:
```bash
python run.py
```
Place the camera in view of the subject and use a microphone to ask questions. The system will process the visual and audio inputs to provide a spoken answer.## Contributions
Contributions to this project are welcome. Please create a pull request with your proposed changes.## Acknowledgements
Special thanks to OpenAI for GPT-4 and Whisper APIs, and to Resemble AI for their voice synthesis technology.