An open API service indexing awesome lists of open source software.

https://github.com/imvladikon/wav2vec2-hebrew

Speech Recognition for Hebrew (using wav2vec2 models)
https://github.com/imvladikon/wav2vec2-hebrew

hebrew speech-recognition

Last synced: 18 days ago
JSON representation

Speech Recognition for Hebrew (using wav2vec2 models)

Awesome Lists containing this project

README

          

# Hebrew Speech Recognition with Wav2Vec2

## Usage

### Without package installation (using `transformers` library)

```python
from transformers import (
AutomaticSpeechRecognitionPipeline,
AutoFeatureExtractor,
Wav2Vec2ForCTC,
AutoTokenizer
)

pretrained_model_name_or_path = "imvladikon/wav2vec2-xls-r-300m-hebrew"
asr = AutomaticSpeechRecognitionPipeline(
feature_extractor=AutoFeatureExtractor.from_pretrained(
pretrained_model_name_or_path
),
model=Wav2Vec2ForCTC.from_pretrained(
pretrained_model_name_or_path
),
tokenizer=AutoTokenizer.from_pretrained(
pretrained_model_name_or_path
))
filename = "audio.wav"
print(asr(filename))
```
Chunking file into smaller chunks is not implemented yet.

### With package installation

```bash
pip install git+https://github.com/imvladikon/wav2vec2-hebrew
```

#### Speech recognition

```python
from wav2vec2_hebrew import HebrewSpeechRecognitionPipeline

asr = HebrewSpeechRecognitionPipeline()
filename = "./samples/bereshit011.wav"
output = asr(filename)
print(output)
# [{'text': 'בראשית ברא אלוהים את השמייים ואת הארץ'}]
```

#### Alignment
```python
import torchaudio
from wav2vec2_hebrew import HebrewWav2Vec2Aligner

filename = "./samples/bereshit011.wav"
text = "בראשית ברא אלוהים את השמיים ואת הארץ"
aligner = HebrewWav2Vec2Aligner(input_sample_rate=16000, use_cuda=True)
# aligning segments to text (sentences)
first_sentence = aligner.align_data(filename, text)[0]
# {'sentence': 'בראשית ברא אלוהים את השמיים ואת הארץ',
# 'segments': [Segment(label='בראשית', start=6750.516853932584, end=18644.284644194755, score=0.16025335497152965)...]}

# showing in IPython (notebook)
waveform, sample_rate = torchaudio.load(filename)
aligner.show_segments(waveform, first_sentence)
# showing segments using IPython.display.Audio
```

## Training process

Training logs and details are available in the [train](train) folder.

### Datasets

* https://huggingface.co/datasets/imvladikon/hebrew_speech_kan
* https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera

### Weights

* [imvladikon/wav2vec2-xls-r-300m-hebrew](https://huggingface.co/imvladikon/wav2vec2-xls-r-300m-hebrew)
* [imvladikon/wav2vec2-xls-r-300m-lm-hebrew](https://huggingface.co/imvladikon/wav2vec2-xls-r-300m-lm-hebrew)