Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/impavloh/whittsper-the-lora

Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with Alpaca-LoRA.
https://github.com/impavloh/whittsper-the-lora

ai google-colab gtts llama whisper

Last synced: about 1 month ago
JSON representation

Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with Alpaca-LoRA.

Host: GitHub
URL: https://github.com/impavloh/whittsper-the-lora
Owner: ImPavloh
License: mit
Created: 2023-03-18T20:39:06.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-30T09:39:31.000Z (9 months ago)
Last Synced: 2024-12-21T10:34:18.572Z (about 1 month ago)
Topics: ai, google-colab, gtts, llama, whisper
Language: Jupyter Notebook
Homepage: https://colab.research.google.com/drive/11MHiNlhQ0ZSqKVl0Fniu085bkQRdJX9E?usp=sharing
Size: 56.6 KB
Stars: 19
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![Demo Preview](https://i.imgur.com/VVpYeb9.png)

[![Google Colab Demo](https://img.shields.io/badge/demo-online-green.svg)](https://colab.research.google.com/drive/11MHiNlhQ0ZSqKVl0Fniu085bkQRdJX9E?usp=sharing)

# WhiTTsper the Lora

Demo that combines Whisper for voice recognition and Google TTS for voice synthesis to interact with Alpaca-LoRA.

### Try it on **Google Colab**

> [!WARNING]
> This project is significantly outdated and may no longer operate as expected.
>
## 📃 Features

- Voice recognition using Whisper with choice of size
- LLaMa 7B language model configurable from the interface
- Voice synthesis using Google Text-to-Speech
- Graphical interface using gradio
- Conversation history available
- Conversation reset function

## 📑 TODO

- [ ] Improve language model
- [ ] Use advanced AI for voice synthesis
- [ ] Optimize the code and ensure its compatibility on different platforms, including Windows, Linux, etc.
- [ ] Add image generation and recognition as additional functionality using Stable Diffusion

---

## 📒 Usage

To use the demo, you need to have access to a microphone. When running all the code, a graphical interface will open in which you can speak into the microphone and get a response from the Alpaca-LoRA AI.

In the graphical interface, you can select the size of the Whisper model to use (tiny, base, small, medium, large). The model size affects the response time of the AI and the quality of the generated response. You can manually change the temperature of the Alpaca-Lora model and reset the conversation.

---

## 📜 Credits
[Alpaca-LoRA](https://github.com/tloen/alpaca-lora) is used as the language model. The Transformers library from Hugging Face is used for the model.

[Whisper](https://github.com/openai/whisper) voice recognition technology from OpenAI and [Google Text-to-Speech](https://github.com/pndurette/gTTS) voice synthesis technology are also used.

The graphical interface is built using the [Gradio library](https://github.com/gradio-app/gradio).

---

## ☕ Support the project

If you like this project or have been helped in some way, consider buying me a [coffee](https://www.buymeacoffee.com/pavloh) as a form of support. This way, I can dedicate more time to open source projects like this and improve them even further :)

---
## 📃 License
> You can view the full license [here](https://github.com/ImPavloh/WhiTTsper-The-Lora/blob/master/LICENSE)

This project is licensed under the terms of the **MIT** license.