https://github.com/bharath-tars/speech-to-speech-bot
A real-time voice bot using LLaMA 3.3-80B for intelligent responses, OpenAI Whisper V3 Turbo for speech-to-text, and gTTS for text-to-speech conversion.
https://github.com/bharath-tars/speech-to-speech-bot
gtts huggingface llama-3-70b python speech-to-speech streamlit tranformers whisper-ai
Last synced: about 2 months ago
JSON representation
A real-time voice bot using LLaMA 3.3-80B for intelligent responses, OpenAI Whisper V3 Turbo for speech-to-text, and gTTS for text-to-speech conversion.
- Host: GitHub
- URL: https://github.com/bharath-tars/speech-to-speech-bot
- Owner: Bharath-tars
- License: gpl-3.0
- Created: 2025-01-18T06:55:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-22T11:44:19.000Z (over 1 year ago)
- Last Synced: 2025-03-29T11:16:48.896Z (over 1 year ago)
- Topics: gtts, huggingface, llama-3-70b, python, speech-to-speech, streamlit, tranformers, whisper-ai
- Language: Python
- Homepage: https://huggingface.co/spaces/BharathTars/Nova-Voice-BOT
- Size: 19.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech-to-Speech Conversational Bot
## Introduction
The **Speech-to-Speech Conversational Bot** is a real-time voice interaction system that combines cutting-edge technologies to enable seamless, intuitive conversations. It utilizes advanced language models and speech processing tools to provide a natural and engaging user experience.
## Features
- **Speech Input**: Records user audio through the interface.
- **Real-time Transcription**: Converts speech to text using **OpenAI Whisper V3 Turbo** via Groq's inference client.
- **Intelligent Responses**: Generates context-aware responses using **LLaMA 3.3-80B**.
- **Speech Output**: Converts responses back to speech using **gTTS** for natural voice playback.
- **User Interface**: Built with **Streamlit** for easy interaction and audio visualization.
## Stack Design
- **Frontend**: Streamlit for audio recording and response display.
- **Speech-to-Text**: OpenAI Whisper V3 Turbo accessed through Groq's inference client.
- **Language Model**: LLaMA 3.3-80B for generating intelligent and conversational responses.
- **Text-to-Speech**: gTTS for converting text back to speech.
## Workflow
1. **Audio Recording**: The user records their query using the Streamlit interface.
2. **Speech Transcription**: The recorded audio is sent to the **Whisper V3 Turbo** model via Groq's inference client for transcription.
3. **Response Generation**: The transcribed text is passed to the **LLaMA 3.3-80B** model to generate a contextually appropriate response.
4. **Speech Synthesis**: The generated response is converted back into speech using **gTTS**.
5. **Playback**: The audio response is played back to the user through the interface.
## Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/speech-to-speech-bot.git
cd speech-to-speech-bot