Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/presizhai/ai-auto-assistant
A voice-based AI agent designed for a car dealership. This agent can interact with users via voice commands, respond using natural language, & provide continuous speech-to-speech conversations. The system uses Google Cloud's Speech-to-Text & Text-to-Speech services, OpenAI's GPT-3.5-turbo for generating responses, & Streamlit for the user interface
https://github.com/presizhai/ai-auto-assistant
google-api google-cloud google-cloud-platform openai openai-api speech-synthesis speech-to-text streamlit text-to-speech
Last synced: 19 days ago
JSON representation
A voice-based AI agent designed for a car dealership. This agent can interact with users via voice commands, respond using natural language, & provide continuous speech-to-speech conversations. The system uses Google Cloud's Speech-to-Text & Text-to-Speech services, OpenAI's GPT-3.5-turbo for generating responses, & Streamlit for the user interface
- Host: GitHub
- URL: https://github.com/presizhai/ai-auto-assistant
- Owner: presiZHai
- License: mit
- Created: 2024-09-12T13:18:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-12T16:16:56.000Z (5 months ago)
- Last Synced: 2024-11-21T16:14:29.870Z (3 months ago)
- Topics: google-api, google-cloud, google-cloud-platform, openai, openai-api, speech-synthesis, speech-to-text, streamlit, text-to-speech
- Language: Python
- Homepage:
- Size: 1.41 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI Auto Assistant
This project is a voice-based AI agent designed for a car dealership. The AI agent can interact with users via voice commands, respond using natural language, and provide continuous speech-to-speech conversations. The system uses Google Cloud's Speech-to-Text and Text-to-Speech services, OpenAI's GPT-3.5-turbo for generating responses, and Streamlit for the user interface.
![AI Voice Assistant for Car Dealership](images/ai_assistant.webp)
## Table of Contents
* **Features**
* **Prerequisites**
* **Installation**
* **Configuration**
* **Usage**
* **Project Structure**
* **Modules Description**
* **License**## Features
* Continuous voice-based interaction without requiring manual intervention between responses.
* Uses Google Cloud for speech recognition and text-to-speech synthesis.
* Incorporates OpenAI's GPT-3.5-turbo to generate natural language responses.
* Implements a turn-taking model using sentiment analysis to decide when the AI should respond.
* Provides a visual indicator while the AI is responding.## Prerequisites
* Python 3.8 or later
* Google Cloud account with Speech-to-Text and Text-to-Speech APIs enabled
* OpenAI API key
* FFmpeg installed and accessible from the command line## Installation
1. Clone the Repository
```bash
git clone https://github.com/presiZHai/ai-auto-assistant.gitcd ai-auto-assistant
```
2. Create and Activate a Virtual Environment```bash
conda create --name voice-auto-assistant python=3.10 -yconda activate voice-auto-assistant
```
3. Install Required Packages```bash
pip install -r requirements.txt
```## Configuration
1. Set Up Environment Variables
Create a .env file in the project root directory with the following content:
```bash
OPENAI_API_KEY=your_openai_api_keyGOOGLE_APPLICATION_CREDENTIALS_PATH=path_to_your_google_service_account_json
```2. Obtain API Keys
* Google Cloud: Create a service account, enable Speech-to-Text and Text-to-Speech APIs, and download the service account key file in JSON format. Set the path to this file in the .env file as GOOGLE_APPLICATION_CREDENTIALS_PATH.
* OpenAI: Sign up at OpenAI and get your API key. Set it as OPENAI_API_KEY in the .env file.## Usage
1. Run the Application
```bash
streamlit run main_app.py
```2. Start or Stop the Conversation
* Click the "Start/Stop Conversation" button to begin or end the voice interaction.
![AI Voice Assistant Example](images/voice-agent.png)
## Project Structure
* main_app.py: The main Streamlit application that handles the user interface and orchestrates the interaction loop.
* stt_module.py: Speech-to-Text module that transcribes user input using Google Cloud's Speech-to-Text API.
* tts_module.py: Text-to-Speech module that converts AI responses to audio using Google Cloud's Text-to-Speech API.
* llm_module.py: Language model module that generates responses using OpenAI's GPT-3.5-turbo.
* turn_taking_model.py: Turn-taking module that decides if the AI should respond based on sentiment analysis.
* requirements.txt: List of Python dependencies required for the project.
* .env: Environment file to store sensitive API keys.## Modules Description
1. stt_module.py
Handles audio transcription using Google Cloud's Speech-to-Text API. This module includes:
* convert_audio: Converts input audio files to the required format (16kHz, mono, WAV).
* transcribe: Transcribes the audio content into text.2. tts_module.py
Generates spoken responses from text using Google Cloud's Text-to-Speech API. This module includes:
* synthesize_speech: Takes input text and returns the audio content in MP3 format.3. llm_module.py
Uses OpenAI's GPT-3.5-turbo to generate conversational responses. This module includes:
* generate_response: Generates a response based on the user's input prompt.4. turn_taking_model.py
Implements a turn-taking model using a sentiment analysis pipeline to decide if the AI should respond. This module includes:
* should_respond: Determines if the AI should respond based on silence duration and sentiment score.5. main_app.py
The main application script that:
* Initializes the modules.
* Records audio from the user.
* Transcribes the audio, generates a response, and synthesizes speech.
* Manages the conversation flow and displays the conversation history.## License
This project is licensed under the MIT License. See the LICENSE file for more information.