Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alxpez/alts

100% free, local & offline voice assistant with speech recognition
https://github.com/alxpez/alts

assistant chatbot llm local offline ollama speech-recognition stt tts voice voice-assistant whisper

Last synced: 2 months ago
JSON representation

100% free, local & offline voice assistant with speech recognition

Awesome Lists containing this project

README

        

alts







( 🎙️ listens | 💭 thinks | 🔊 speaks )

## 💬 about
100% free, local and offline assistant with speech recognition and talk-back functionalities.

## 🤖 default usage

ALTS runs in the background and waits for you to press `cmd+esc` (or `win+esc`).
- 🎙️ While holding the hotkey, your voice will be recorded _(saves in the project root)_.
- 💭 On release, the recording stops and a transcript is sent to the LLM _(the recording is deleted)_.
- 🔊 The LLM responses then get synthesized and played back to you _(also shown as desktop notifications)_.

You can modify the hotkey combination and other settings in your `config.yaml`.

> ALL processes are local and __NONE__ of your recordings or queries leave your environment; the recordings are deleted as soon as they are used; it's __ALL PRIVATE__ by _default_

## ⚙️ pre-requisites
- ### python
> (tested on) version \>=3.11 on macOS and version \>=3.8 on windows

- ### llm
By default, the project is configured to work with [Ollama](https://ollama.ai/), running the [`stablelm2` model](https://ollama.ai/library/stablelm2) (a very tiny and quick model). This setup makes the whole system completely free to run locally and great for low resource machines.

However, we use [LiteLLM](https://github.com/BerriAI/litellm) in order to be provider agnostic, so you have full freedom to pick and choose your own combinations.
Take a look at the supported [Models/Providers](https://docs.litellm.ai/docs/providers) for more details on LLM configuration.
> See `.env.template` and `config-template.yaml` for customizing your setup

- ### stt
We use `openAI's whisper` to transcribe your voice queries. It's a general-purpose speech recognition model.

You will need to have [`ffmepg`](https://ffmpeg.org/) installed in your environment, you can [download](https://ffmpeg.org/download.html) it from the official site.

Make sure to check out their [setup](https://github.com/openai/whisper?tab=readme-ov-file#setup) docs, for any other requirement.
> if you stumble into errors, one reason could be the model not downloading automatically. If that's the case you can run a `whisper` example transcription in your terminal ([see examples](https://github.com/openai/whisper?tab=readme-ov-file#command-line-usage)) or manually download it and place the model-file in the [correct folder](https://github.com/openai/whisper/discussions/63)

- ### tts
We use `coqui-TTS` for ALTS to talk-back to you. It's a library for advanced Text-to-Speech generation.

You will need to install [`eSpeak-ng`](https://github.com/espeak-ng/espeak-ng) in your environment:
- macOS – `brew install espeak`
- linux – `sudo apt-get install espeak -y`
- windows – [download](https://github.com/espeak-ng/espeak-ng/releases) the executable from their repo
> on __windows__ you'll also need `Desktop development with C++` and `.NET desktop build tools`.
> Download the [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and install these dependencies.

Make sure to check out their [setup](https://github.com/coqui-ai/TTS/tree/dev#installation) docs, for any other requirement.
> if you don't have the configured model already downloaded it should download automatically during startup, however if you encounter any problems, the default model can be pre-downloaded by running the following:
> ```ssh
> tts --text "this is a setup test" --out_path test_output.wav --model_name tts_models/en/vctk/vits --speaker_idx p364
> ```
> The default model has several "speakers" to choose from; running the following command will serve a demo site where you can test the different voices available:
> ```ssh
> tts-server --model_name tts_models/en/vctk/vits
> ```

## ✅ get it running
clone the repo
```ssh
git clone https://github.com/alxpez/alts.git
```

go to the main folder
```ssh
cd alts/
```

install the project dependencies
```ssh
pip install -r requirements.txt
```
> see the [pre-requisites](#%EF%B8%8F-pre-requisites) section, to make sure your machine is ready to start the ALTS

duplicate and rename the needed config files
```ssh
cp config-template.yaml config.yaml
cp .env.template .env
```
> modify the default configuration to your needs

start up ALTS
```ssh
sudo python alts.py
```
> the `keyboard` package requires to be run as admin (in macOS and Linux), it's not the case on Windows