{"id":51071984,"url":"https://github.com/krmanik/tiny-aligner","last_synced_at":"2026-06-23T11:30:44.124Z","repository":{"id":361178577,"uuid":"1253415300","full_name":"krmanik/tiny-aligner","owner":"krmanik","description":"Accurate, fast, tiny. A CTC-based acoustic model (930K parameters) for Chinese + English audio forced alignment, running in browser.","archived":false,"fork":false,"pushed_at":"2026-05-29T12:58:31.000Z","size":9755,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-29T14:18:02.527Z","etag":null,"topics":["forced-alignment","grapheme-to-phone","pronunciation-dictionary","python"],"latest_commit_sha":null,"homepage":"https://krmanik.github.io/tiny-aligner/","language":"Roff","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/krmanik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-29T12:43:57.000Z","updated_at":"2026-05-29T13:50:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/krmanik/tiny-aligner","commit_stats":null,"previous_names":["krmanik/tiny-aligner"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/krmanik/tiny-aligner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krmanik%2Ftiny-aligner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krmanik%2Ftiny-aligner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krmanik%2Ftiny-aligner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krmanik%2Ftiny-aligner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/krmanik","download_url":"https://codeload.github.com/krmanik/tiny-aligner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krmanik%2Ftiny-aligner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34686727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["forced-alignment","grapheme-to-phone","pronunciation-dictionary","python"],"created_at":"2026-06-23T11:30:42.363Z","updated_at":"2026-06-23T11:30:44.116Z","avatar_url":"https://github.com/krmanik.png","language":"Roff","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TinyAligner — Bilingual Forced Alignment\n\nAccurate, fast, tiny. A CTC-based acoustic model (930K parameters) for Chinese + English audio forced alignment, running in your browser.\n\n**Key features:**\n- 📱 Browser-deployable via ONNX (~3.7 MB float32)\n- 🗣️ Bilingual: Chinese (AISHELL) + English (LibriSpeech)\n\n---\n\n## Quick Start (5 min)\n\n### 1. Install\n\n```bash\nconda create -n tiny-aligner python=3.12\nconda activate tiny-aligner\npip install -r requirements.txt\n```\n\n### 2. Download Data (optional, for training)\n\n**AISHELL-1** (Chinese, ~15 GB):\n```bash\nwget http://www.openslr.org/resources/33/data_aishell.tgz\ntar -xzf data_aishell.tgz -C data/aishell/\n```\n\n**LibriSpeech** (English, ~6-7 GB per split):\n```bash\nwget https://www.openslr.org/resources/12/train-clean-100.tar.gz\ntar -xzf train-clean-100.tar.gz -C data/\n```\n\nSee [docs/quickstart.md](docs/quickstart.md) for detailed setup.\n\n### 3. Train a Model\n\n```bash\n# Bilingual (Chinese + English)\npython scripts/train.py --epochs 80\n\n# Chinese-only\npython scripts/train.py --epochs 50 --librispeech-root \"\"\n```\n\nMonitor via sample decodes printed each epoch. See [docs/training.md](docs/training.md) for tuning \u0026 parameters.\n\n### 4. Export for Browser\n\n```bash\npython scripts/export_onnx.py\n# → browser/public/model.onnx + lexicon.json\n```\n\n### 5. Run Browser App\n\n```bash\ncd browser\nnpm install\nnpm run dev\n# → http://localhost:5173\n```\n\nUpload audio + text, get word/character/phoneme alignments as TextGrid or SRT.\n\n---\n\n## How It Works\n\n**Architecture:**\n```\nAudio (16kHz WAV) → Mel-filterbank (40 dims) → Conv3 + Bi-GRU → Softmax over ~260 phones\n                                                  [930K params]\n```\n\n**Alignment:**\n- Model outputs per-frame log-probabilities (50 Hz, 20ms)\n- CTC forced alignment: Viterbi decoding with a fixed phone sequence\n- Maps frames → phones → words → characters\n\n**Languages:**\n- **Chinese**: 220 phonemes (pinyin + tones from AISHELL lexicon)\n- **English**: 39 phonemes (ARPAbet, stress-normalized; CMUdict)\n- Bilingual training (balanced 50/50 sampling) since May 2026\n\n---\n\n## Project Structure\n\n```\nforced-alignment/\n├── README.md                      # This file\n├── requirements.txt\n│\n├── src/tiny_aligner/              # Python library\n│   ├── model.py                   # CTC acoustic model\n│   ├── dataset.py                 # AISHELL + LibriSpeech loaders\n│   ├── lexicon.py                 # Phoneme utilities\n│   └── align.py                   # TextGrid generation\n│\n├── scripts/\n│   ├── train.py                   # Training script\n│   └── export_onnx.py             # ONNX export\n│\n├── browser/                       # SvelteKit web app\n│   ├── src/lib/alignment/         # ONNX + CTC alignment (TypeScript)\n│   └── public/                    # model.onnx + lexicon.json\n│\n├── data/\n│   ├── aishell/                   # AISHELL-1 (Chinese)\n│   ├── LibriSpeech/               # LibriSpeech (English)\n│   └── lexicons/cmudict-0.7b      # CMU Pronouncing Dictionary\n│\n├── checkpoints/                   # Saved models\n├── tests/                         # Unit tests\n└── docs/                          # Documentation\n    ├── quickstart.md              # Install \u0026 data setup\n    ├── training.md                # Training guide\n    ├── api.md                     # Python API\n    ├── browser.md                 # Browser deployment\n    └── architecture.md            # Model design details\n```\n\n---\n\n## Documentation\n\n| Document | Purpose |\n|----------|---------|\n| [docs/quickstart.md](docs/quickstart.md) | Installation, data download, first run |\n| [docs/training.md](docs/training.md) | Training modes, parameters, tuning |\n| [docs/api.md](docs/api.md) | Python API reference |\n| [docs/browser.md](docs/browser.md) | Browser deployment (Vercel, Netlify, static) |\n| [docs/architecture.md](docs/architecture.md) | Model design, training recipes, troubleshooting |\n\n---\n\n## Performance\n\n| Metric | Value | Notes |\n|--------|-------|-------|\n| Model size | 3.7 MB | float32 ONNX |\n| Parameters | 930K | ~900K INT8 quantized |\n| Inference speed | RTF ~0.02 | CPU; \u003c10ms per sec of audio |\n| Frame rate | 50 Hz | 20ms resolution |\n| Languages | 2 | Chinese (220 phones) + English (39 phones) |\n\n\n---\n\n## Citation\n\nIf you use TinyAligner in research, please cite:\n\n```bibtex\n@software{tinyaligner2026,\n  title = {TinyAligner: Bilingual Forced Alignment for Browser},\n  author = {krmanik},\n  year = {2026},\n  url = {https://github.com/krmanik/tiny-aligner}\n}\n```\n\n---\n\n## License\n\n**Code:** MIT License\n\n**Data:**\n- **AISHELL-1**: Apache License 2.0\n- **LibriSpeech**: CC BY 4.0\n- **CMUdict**: Public Domain\n\n---\n\n\n## Contributing\n\nContributions welcome! Areas:\n- Non-Latin script support (Korean, Japanese, etc.)\n- Streaming inference (non-offline)\n- More training recipes \u0026 pretrained models\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrmanik%2Ftiny-aligner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrmanik%2Ftiny-aligner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrmanik%2Ftiny-aligner/lists"}