Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/egorsmkv/xeus-finetune
XEUS training code
https://github.com/egorsmkv/xeus-finetune
asr speech-recognition ssl xeus
Last synced: 25 days ago
JSON representation
XEUS training code
- Host: GitHub
- URL: https://github.com/egorsmkv/xeus-finetune
- Owner: egorsmkv
- Created: 2024-07-18T22:43:09.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2024-07-19T20:39:13.000Z (4 months ago)
- Last Synced: 2024-07-20T07:14:35.648Z (4 months ago)
- Topics: asr, speech-recognition, ssl, xeus
- Language: Python
- Homepage: https://wanchichen.github.io/pdf/xeus.pdf
- Size: 41 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# `xeus-finetune`
> [!WARNING]
> Currently, this work is in progress.This repository contains training code for [the XEUS model][1] for Automatic Speech Recognition (ASR).
This is a fork of https://github.com/pashanitw/xeus-finetune## Required software
- python3.11, python3.11-dev
- build-essential, cmake
- [uv][2]
- git-lfs> [!NOTE]
> Python 3.12 cannot be used because one of the dependencies in ESPnet relies on an old package.## Install
```shell
uv venv --python 3.11source .venv/bin/activate
# install espnet
git clone --branch ssl --depth 1 https://github.com/wanchichen/espnet espnet-code
cd espnet-code
git fetch --unshallow
uv pip install -e .# download XEUS checkpoint
git clone https://huggingface.co/espnet/XEUS# install required packages
uv pip install -r requirements.txt# in development mode install additional packages
uv pip install -r requirements-dev.txt
```## Fine-tuning
1. Authenticate with HF
```shell
huggingface-cli login
```2. Copy a config file, change dataset sections and hparams
```shell
cp configs/hi_hf.yaml configs/uk_hf.yaml
```3. Start fine-tuning
```shell
accelerate launch finetune.py --config configs/uk_hf.yaml# if you want to use only one GPU
accelerate launch --num_processes 1 finetune.py --config configs/uk_hf.yaml
```## Inference
```shell
python inference.py --ckpt_path --audio audio.wav# example
python inference.py --ckpt_path ./step_2000 --audio audio.wav
```## Evaluation
Run the following command to calculate Word Error Rate:
```shell
python eval.py --ckpt_path --dataset --name --split# example
python eval.py --ckpt_path ./step_2000 --dataset mozilla-foundation/common_voice_17_0 --name uk --split test
```## Development
Check and format the code:
```shell
ruff check
ruff format
```## TODO
- [ ] Enable Flash-Attention for training
- [ ] Set a cache_dir for `load_dataset`[1]: https://www.wavlab.org/activities/2024/xeus/
[2]: https://github.com/astral-sh/uv