Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/modal-labs/quillman
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
https://github.com/modal-labs/quillman
ai language-model python serverless speech-recognition speech-to-text
Last synced: 3 months ago
JSON representation
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
- Host: GitHub
- URL: https://github.com/modal-labs/quillman
- Owner: modal-labs
- License: mit
- Created: 2023-04-19T15:07:55.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-19T02:53:16.000Z (4 months ago)
- Last Synced: 2024-09-19T03:51:45.626Z (4 months ago)
- Topics: ai, language-model, python, serverless, speech-recognition, speech-to-text
- Language: Python
- Homepage: https://modal.com/docs/guide/llm-voice-chat
- Size: 4.04 MB
- Stars: 1,022
- Watchers: 10
- Forks: 111
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# QuiLLMan: Voice Chat with LLMs
A complete chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
This repo is meant to serve as a starting point for your own language model-based apps, as well as a playground for experimentation. Contributions are welcome and encouraged!
![quillman](https://user-images.githubusercontent.com/5786378/233804923-c13627de-97db-4050-a36b-62d955db9c19.gif)
The language model used is [Zephyr](https://arxiv.org/abs/2310.16944). [OpenAI Whisper](https://github.com/openai/whisper) is used for transcription, and [Metavoice Tortoise TTS](https://github.com/metavoicexyz/tortoise-tts) is used for text-to-speech. The entire app, including the frontend, is made to be deployed serverlessly on [Modal](http://modal.com/).
You can find the demo live [here](https://modal-labs--quillman-web.modal.run/).
[Note: this code is provided for illustration only; please remember to check the license before using any model for commercial purposes.]
## File structure
1. React frontend ([`src/frontend/`](./src/frontend/))
2. FastAPI server ([`src/app.py`](./src/app.py))
3. Whisper transcription module ([`src/transcriber.py`](./src/transcriber.py))
4. Tortoise text-to-speech module ([`src/tts.py`](./src/tts.py))
5. Zephyr language model module ([`src/llm_zephyr.py`](./src/llm_zephyr.py))Read the accompanying [docs](https://modal.com/docs/examples/llm-voice-chat) for a detailed look at each of these components.
## Developing locally
### Requirements
- `modal` installed in your current Python virtual environment (`pip install modal`)
- A [Modal](http://modal.com/) account
- A Modal token set up in your environment (`modal token new`)### Develop on Modal
To [serve](https://modal.com/docs/guide/webhooks#developing-with-modal-serve) the app on Modal, run this command from the root directory of this repo:
```shell
modal serve src.app
```In the terminal output, you'll find a URL that you can visit to use your app. While the `modal serve` process is running, changes to any of the project files will be automatically applied. `Ctrl+C` will stop the app.
### Deploy to Modal
Once you're happy with your changes, [deploy](https://modal.com/docs/guide/managing-deployments#creating-deployments) your app:
```shell
modal deploy src.app
```[Note that leaving the app deployed on Modal doesn't cost you anything! Modal apps are serverless and scale to 0 when not in use.]