Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ayushpai/GPT-4o-Assistant

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/ayushpai/GPT-4o-Assistant
Owner: ayushpai
License: mit
Created: 2024-05-17T13:22:54.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-06-06T19:46:20.000Z (4 months ago)
Last Synced: 2024-07-23T05:24:06.539Z (2 months ago)
Language: Python
Size: 57.5 MB
Stars: 49
Watchers: 2
Forks: 25
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Interactive Assistant Using GPT-4o

This repository contains the code for an advanced interactive assistant powered by OpenAI's newest model, GPT-4o. The assistant leverages multiple inputs, including screenshots and audio, to provide contextual and accurate responses to user queries. It integrates with a document database to ensure responses are based on relevant context from provided documents.

### Key Features

- **Screenshot Capture and Encoding**: Utilizes PyAutoGUI to capture screenshots and encode them in base64 format for input to the model.
- **Audio Detection and Transcription**: Detects and records audio using `sounddevice`, processes it with `Whisper`, and transcribes it to text.
- **Contextual Responses**: Employs document embeddings and similarity search with LangChain and SingleStoreDB to find relevant context for generating responses.
- **Text-to-Speech Output**: Converts the model's responses into speech using OpenAI's TTS capabilities.

### Technologies Used

- **OpenAI GPT-4o**: For generating responses.
- **OpenCV**: For image processing.
- **Whisper**: For audio transcription.
- **Sounddevice and Soundfile**: For audio handling.
- **Playsound**: For audio playback.
- **PyAutoGUI**: For screenshot capture.
- **LangChain**: For document processing and embedding.
- **SingleStoreDB**: For document storage and similarity search.

### How It Works

1. **Capture Screenshot**: The assistant captures a screenshot of the current screen.
2. **Record Audio**: It listens for user speech, records the audio, and transcribes it into text.
3. **Context Retrieval**: Searches for relevant context in a document database using similarity search.
4. **Generate Response**: Sends the transcribed text, captured screenshot, and relevant context to the GPT-4o model to generate a response.
5. **Text-to-Speech**: Converts the generated response to speech and plays it back to the user.

### Setup Instructions

1. Clone the repository.
2. Install the required dependencies:
```bash
pip install openai opencv-python-headless sounddevice numpy soundfile speechrecognition whisper playsound pyautogui langchain_community singlestoredb
```
3. Set your OpenAI API key in the environment variable `OPENAI_API_KEY`.
4. Set your SingleStoreDB URL in the environment variable `SINGLESTOREDB_URL`.
5. Place your documents (e.g., `pytorch_docs.txt`) in the same directory.
6. Run the main script:
```bash
python computer_assistant.py or python assistant.py
```

### Future Enhancements

- Integrate additional sensors and input methods.
- Improve audio quality and handling.
- Extend the assistant's capabilities for different use cases and domains.

Feel free to contribute and enhance this interactive assistant!