https://github.com/zigaowang/ai-voice-assistant
https://github.com/zigaowang/ai-voice-assistant
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/zigaowang/ai-voice-assistant
- Owner: ZigaoWang
- License: mit
- Created: 2024-08-08T02:08:54.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-08-12T06:02:10.000Z (about 1 year ago)
- Last Synced: 2025-04-07T10:35:52.487Z (6 months ago)
- Language: Python
- Size: 10.7 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI Voice Assistant
> [!NOTE]
>
> ## 广告插播:TurboAI
>
> ### 全面、快速、稳定的 AI 中转服务——**TurboAI**
>
> 高性价比的智能 API 转发服务,汇聚 OpenAI、Gemini、Claude、Zhipu、Suno 等顶尖 AI 模型。全球快速响应,稳定可靠,按量付费,安全无忧,兼容多种模型协议,为您提供专业支持和卓越性能保障。
>
> 我个人强烈推荐使用 **TurboAI**,因为在此项目中使用的 GPT-4o-mini, TTS-1, Whisper-1 都只需要一个 **TurboAI API Key**。这大大简化了使用过程,因此推荐给大家。
>
> ### 注册链接:[点击这里](https://api.turboai.io/register?aff=VkS0)This project is an AI Voice Assistant that uses OpenAI's GPT-4o models for natural language processing and Whisper for speech-to-text transcription. The assistant can have a back-and-forth conversation with the user, converting spoken input into text, generating a response, and converting the response back into speech.
## Features
- **Record Audio**: Record user input via a microphone.
- **Transcribe Audio**: Convert the recorded audio into text using OpenAI's Whisper model.
- **Generate Response**: Generate a response to the user's input using OpenAI's GPT-4 model.
- **Text-to-Speech**: Convert the generated text response back into speech.
- **Play Audio**: Play the generated speech response.
- **Continuous Conversation**: Maintain a conversation until the user decides to exit.## Prerequisites
- Python 3.6 or later
- An OpenAI API key
- Required Python libraries: `openai`, `pyaudio`, `wave`, `simpleaudio`, `dotenv`, `pathlib`, `uuid`, `asyncio`## Installation
1. **Clone the repository**:
```bash
git clone https://github.com/ZigaoWang/ai-voice-assistant.git
cd ai-voice-assistant
```2. **Create a virtual environment and activate it**:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```3. **Install the dependencies**:
```bash
pip install -r requirements.txt
```4. **Set up environment variables**:
Create a `.env` file in the root directory of the project and add your OpenAI API key:
```env
OPENAI_API_KEY=your-openai-api-key
OPENAI_BASE_URL=https://api.openai.com/v1
```## Usage
1. **Run the script**:
```bash
python main.py
```2. **Interact with the assistant**:
- The assistant will record your voice for 5 seconds.
- It will transcribe your speech into text and generate a response.
- The response will be converted to speech and played back to you.
- The conversation will continue until you say "exit".## Example Output
```plaintext
Recording...
Finished recording.
Transcription response: Hello.User: Hello.
Assistant: Hello! How can I assist you today?
Recording...
Finished recording.
Transcription response: Can you tell me a joke?User: Can you tell me a joke?
Assistant: Sure! Why don't scientists trust atoms? Because they make up everything!
...
```## Contributing
1. **Fork the repository**.
2. **Create a new branch**:
```bash
git checkout -b feature/your-feature-name
```
3. **Make your changes**.
4. **Commit your changes**:
```bash
git commit -m 'Add some feature'
```
5. **Push to the branch**:
```bash
git push origin feature/your-feature-name
```
6. **Create a new Pull Request**.## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Acknowledgments
- [OpenAI](https://www.openai.com/) for providing the GPT-4 and Whisper models.
- [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) for audio recording.
- [SimpleAudio](https://simpleaudio.readthedocs.io/en/latest/) for audio playback.
- [dotenv](https://pypi.org/project/python-dotenv/) for managing environment variables.