https://github.com/agentem-ai/izwi

On-device Voice AI engine for transcription, TTS, and voice workflows.
https://github.com/agentem-ai/izwi

asr audio-inference local-first openai-compatible-api self-hosted-ai speaker-diarization speech-to-text text-to-speech tts voice-cloning

Last synced: 4 months ago
JSON representation

On-device Voice AI engine for transcription, TTS, and voice workflows.

Host: GitHub
URL: https://github.com/agentem-ai/izwi
Owner: agentem-ai
Created: 2026-01-23T16:53:17.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-05T08:58:45.000Z (4 months ago)
Last Synced: 2026-03-05T13:09:16.051Z (4 months ago)
Topics: asr, audio-inference, local-first, openai-compatible-api, self-hosted-ai, speaker-diarization, speech-to-text, text-to-speech, tts, voice-cloning
Language: Rust
Homepage: https://www.izwiai.com
Size: 28.9 MB
Stars: 158
Watchers: 5
Forks: 14
Open Issues: 6
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

          


  



Izwi


Local-first audio inference engine for TTS, ASR, and voice AI workflows.




  Website •

  Documentation •

  Releases •

  Getting Started





  



---

## Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

**Core capabilities:**

- **Voice Mode** — Real-time voice conversations with AI

- **Text-to-Speech** — Generate natural speech from text

- **Speech Recognition** — Convert audio to text with high accuracy

- **Speaker Diarization** — Identify and separate multiple speakers

- **Voice Cloning** — Clone any voice from a short audio sample

- **Voice Design** — Create custom voices from text descriptions

- **Forced Alignment** — Word-level audio-text alignment

- **Chat** — Text-based AI conversations

The server exposes OpenAI-compatible API routes under `/v1`.

---

## Quick Install

### macOS

Download the latest `.dmg` from [GitHub Releases](https://github.com/agentem-ai/izwi/releases):

1. Open the `.dmg` file

2. Drag **Izwi.app** to Applications

3. Launch Izwi

### Linux

```bash

wget https://github.com/agentem-ai/izwi/releases/latest/download/izwi_amd64.deb

sudo dpkg -i izwi_amd64.deb

```

### Windows

Download and run the installer from [GitHub Releases](https://github.com/agentem-ai/izwi/releases).

> **Full installation guides:** [macOS](https://izwiai.com/docs/installation/macos) • [Linux](https://izwiai.com/docs/installation/linux) • [Windows](https://izwiai.com/docs/installation/windows) • [From Source](https://izwiai.com/docs/installation/from-source)

---

## Quick Start

### 1. Start the server

```bash

izwi serve

```

Open `http://localhost:8080` in your browser.

### 2. Download a model

```bash

izwi pull Qwen3-TTS-12Hz-0.6B-Base

```

### 3. Generate speech

```bash

izwi tts "Hello from Izwi!" --output hello.wav

```

### 4. Transcribe audio

```bash

izwi pull Qwen3-ASR-0.6B

izwi transcribe audio.wav

```

Long-form ASR is handled automatically: Izwi now chunks long recordings,

stitches overlapping transcripts, and returns a full transcript instead of

only the first model window.

Optional tuning knobs:

```bash

IZWI_ASR_CHUNK_TARGET_SECS=24

IZWI_ASR_CHUNK_MAX_SECS=30

IZWI_ASR_CHUNK_OVERLAP_SECS=3

```

---

## Supported Models

| Category | Models |

|----------|--------|

| **TTS** | Qwen3-TTS (0.6B, 1.7B), LFM2-Audio |

| **ASR** | Qwen3-ASR (0.6B, 1.7B), Parakeet TDT |

| **Diarization** | Sortformer 4-speaker |

| **Chat** | Qwen3 (0.6B, 1.7B), Gemma 3 (1B, 4B) |

| **Alignment** | Qwen3-ForcedAligner |

Run `izwi list` to see all available models.

> **Full model documentation:** [Models Guide](https://izwiai.com/docs/models)

---

## Documentation

| Resource | Link |

|----------|------|

| **Getting Started** | [izwiai.com/docs/getting-started](https://izwiai.com/docs/getting-started) |

| **Installation** | [izwiai.com/docs/installation](https://izwiai.com/docs/installation) |

| **Features** | [izwiai.com/docs/features](https://izwiai.com/docs/features) |

| **CLI Reference** | [izwiai.com/docs/cli](https://izwiai.com/docs/cli) |

| **Models** | [izwiai.com/docs/models](https://izwiai.com/docs/models) |

| **Troubleshooting** | [izwiai.com/docs/troubleshooting](https://izwiai.com/docs/troubleshooting) |

---

## License

Apache 2.0

## Acknowledgments

- [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) by Alibaba

- [Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) by NVIDIA

- [Gemma](https://ai.google.dev/gemma) by Google

- [LFM2-Audio](https://www.liquid.ai/) by Liquid AI

- [HuggingFace Hub](https://huggingface.co/) for model hosting

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/agentem-ai/izwi

Awesome Lists containing this project

README

Izwi