https://github.com/krmanik/tiny-aligner

Accurate, fast, tiny. A CTC-based acoustic model (930K parameters) for Chinese + English audio forced alignment, running in browser.
https://github.com/krmanik/tiny-aligner

forced-alignment grapheme-to-phone pronunciation-dictionary python

Last synced: 4 days ago
JSON representation

Accurate, fast, tiny. A CTC-based acoustic model (930K parameters) for Chinese + English audio forced alignment, running in browser.

Host: GitHub
URL: https://github.com/krmanik/tiny-aligner
Owner: krmanik
Created: 2026-05-29T12:43:57.000Z (29 days ago)
Default Branch: main
Last Pushed: 2026-05-29T12:58:31.000Z (29 days ago)
Last Synced: 2026-05-29T14:18:02.527Z (29 days ago)
Topics: forced-alignment, grapheme-to-phone, pronunciation-dictionary, python
Language: Roff
Homepage: https://krmanik.github.io/tiny-aligner/
Size: 9.3 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # TinyAligner — Bilingual Forced Alignment

Accurate, fast, tiny. A CTC-based acoustic model (930K parameters) for Chinese + English audio forced alignment, running in your browser.

**Key features:**

- 📱 Browser-deployable via ONNX (~3.7 MB float32)

- 🗣️ Bilingual: Chinese (AISHELL) + English (LibriSpeech)

---

## Quick Start (5 min)

### 1. Install

```bash

conda create -n tiny-aligner python=3.12

conda activate tiny-aligner

pip install -r requirements.txt

```

### 2. Download Data (optional, for training)

**AISHELL-1** (Chinese, ~15 GB):

```bash

wget http://www.openslr.org/resources/33/data_aishell.tgz

tar -xzf data_aishell.tgz -C data/aishell/

```

**LibriSpeech** (English, ~6-7 GB per split):

```bash

wget https://www.openslr.org/resources/12/train-clean-100.tar.gz

tar -xzf train-clean-100.tar.gz -C data/

```

See [docs/quickstart.md](docs/quickstart.md) for detailed setup.

### 3. Train a Model

```bash

# Bilingual (Chinese + English)

python scripts/train.py --epochs 80

# Chinese-only

python scripts/train.py --epochs 50 --librispeech-root ""

```

Monitor via sample decodes printed each epoch. See [docs/training.md](docs/training.md) for tuning & parameters.

### 4. Export for Browser

```bash

python scripts/export_onnx.py

# → browser/public/model.onnx + lexicon.json

```

### 5. Run Browser App

```bash

cd browser

npm install

npm run dev

# → http://localhost:5173

```

Upload audio + text, get word/character/phoneme alignments as TextGrid or SRT.

---

## How It Works

**Architecture:**

```

Audio (16kHz WAV) → Mel-filterbank (40 dims) → Conv3 + Bi-GRU → Softmax over ~260 phones

                                                  [930K params]

```

**Alignment:**

- Model outputs per-frame log-probabilities (50 Hz, 20ms)

- CTC forced alignment: Viterbi decoding with a fixed phone sequence

- Maps frames → phones → words → characters

**Languages:**

- **Chinese**: 220 phonemes (pinyin + tones from AISHELL lexicon)

- **English**: 39 phonemes (ARPAbet, stress-normalized; CMUdict)

- Bilingual training (balanced 50/50 sampling) since May 2026

---

## Project Structure

```

forced-alignment/

├── README.md                      # This file

├── requirements.txt

│

├── src/tiny_aligner/              # Python library

│   ├── model.py                   # CTC acoustic model

│   ├── dataset.py                 # AISHELL + LibriSpeech loaders

│   ├── lexicon.py                 # Phoneme utilities

│   └── align.py                   # TextGrid generation

│

├── scripts/

│   ├── train.py                   # Training script

│   └── export_onnx.py             # ONNX export

│

├── browser/                       # SvelteKit web app

│   ├── src/lib/alignment/         # ONNX + CTC alignment (TypeScript)

│   └── public/                    # model.onnx + lexicon.json

│

├── data/

│   ├── aishell/                   # AISHELL-1 (Chinese)

│   ├── LibriSpeech/               # LibriSpeech (English)

│   └── lexicons/cmudict-0.7b      # CMU Pronouncing Dictionary

│

├── checkpoints/                   # Saved models

├── tests/                         # Unit tests

└── docs/                          # Documentation

    ├── quickstart.md              # Install & data setup

    ├── training.md                # Training guide

    ├── api.md                     # Python API

    ├── browser.md                 # Browser deployment

    └── architecture.md            # Model design details

```

---

## Documentation

| Document | Purpose |

|----------|---------|

| [docs/quickstart.md](docs/quickstart.md) | Installation, data download, first run |

| [docs/training.md](docs/training.md) | Training modes, parameters, tuning |

| [docs/api.md](docs/api.md) | Python API reference |

| [docs/browser.md](docs/browser.md) | Browser deployment (Vercel, Netlify, static) |

| [docs/architecture.md](docs/architecture.md) | Model design, training recipes, troubleshooting |

---

## Performance

| Metric | Value | Notes |

|--------|-------|-------|

| Model size | 3.7 MB | float32 ONNX |

| Parameters | 930K | ~900K INT8 quantized |

| Inference speed | RTF ~0.02 | CPU; <10ms per sec of audio |

| Frame rate | 50 Hz | 20ms resolution |

| Languages | 2 | Chinese (220 phones) + English (39 phones) |

---

## Citation

If you use TinyAligner in research, please cite:

```bibtex

@software{tinyaligner2026,

  title = {TinyAligner: Bilingual Forced Alignment for Browser},

  author = {krmanik},

  year = {2026},

  url = {https://github.com/krmanik/tiny-aligner}

}

```

---

## License

**Code:** MIT License

**Data:**

- **AISHELL-1**: Apache License 2.0

- **LibriSpeech**: CC BY 4.0

- **CMUdict**: Public Domain

---

## Contributing

Contributions welcome! Areas:

- Non-Latin script support (Korean, Japanese, etc.)

- Streaming inference (non-offline)

- More training recipes & pretrained models

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/krmanik/tiny-aligner

Awesome Lists containing this project

README