Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/AlexandreSajus/JARVIS

Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface
https://github.com/AlexandreSajus/JARVIS

deepgram elevenlabs llm openai python taipy tts voice-assistant

Last synced: about 2 months ago
JSON representation

Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface

Awesome Lists containing this project

README

        

# JARVIS


JARVIS helping me choose a firearm

Your own voice personal assistant: Voice to Text to LLM to Speech, displayed in a web interface.

## How it works

1. :microphone: The user speaks into the microphone
2. :keyboard: Voice is converted to text using Deepgram
3. :robot: Text is sent to OpenAI's GPT-3 API to generate a response
4. :loudspeaker: Response is converted to speech using ElevenLabs
5. :loud_sound: Speech is played using Pygame
6. :computer: Conversation is displayed in a webpage using Taipy

## Video Demo



Youtube Devlog

## Requirements

**Python 3.8 - 3.11**

Make sure you have the following API keys:
- Deepgram
- OpenAI
- Elevenlabs

## How to install

1. Clone the repository

```bash
git clone https://github.com/AlexandreSajus/JARVIS.git
```

2. Install the requirements

```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the root directory and add the following variables:

```bash
DEEPGRAM_API_KEY=XXX...XXX
OPENAI_API_KEY=sk-XXX...XXX
ELEVENLABS_API_KEY=XXX...XXX
```

## How to use

1. Run `display.py` to start the web interface

```bash
python display.py
```

2. In another terminal, run `jarvis.py` to start the voice assistant

```bash
python main.py
```

- Once ready, both the web interface and the terminal will show `Listening...`
- You can now speak into the microphone
- Once you stop speaking, it will show `Stopped listening`
- It will then start processing your request
- Once the response is ready, it will show `Speaking...`
- The response will be played and displayed in the web interface.

Here is an example:

```
Listening...
Done listening
Finished transcribing in 1.21 seconds.
Finished generating response in 0.72 seconds.
Finished generating audio in 1.85 seconds.
Speaking...

--- USER: good morning jarvis
--- JARVIS: Good morning, Alex! How can I assist you today?

Listening...
...
```


Saying good morning