https://github.com/imvladikon/wav2vec2-hebrew
Speech Recognition for Hebrew (using wav2vec2 models)
https://github.com/imvladikon/wav2vec2-hebrew
hebrew speech-recognition
Last synced: 18 days ago
JSON representation
Speech Recognition for Hebrew (using wav2vec2 models)
- Host: GitHub
- URL: https://github.com/imvladikon/wav2vec2-hebrew
- Owner: imvladikon
- Created: 2022-01-25T14:29:21.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-08T07:12:02.000Z (over 2 years ago)
- Last Synced: 2025-09-07T12:40:02.000Z (about 1 month ago)
- Topics: hebrew, speech-recognition
- Language: Python
- Homepage:
- Size: 211 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hebrew Speech Recognition with Wav2Vec2
## Usage
### Without package installation (using `transformers` library)
```python
from transformers import (
AutomaticSpeechRecognitionPipeline,
AutoFeatureExtractor,
Wav2Vec2ForCTC,
AutoTokenizer
)pretrained_model_name_or_path = "imvladikon/wav2vec2-xls-r-300m-hebrew"
asr = AutomaticSpeechRecognitionPipeline(
feature_extractor=AutoFeatureExtractor.from_pretrained(
pretrained_model_name_or_path
),
model=Wav2Vec2ForCTC.from_pretrained(
pretrained_model_name_or_path
),
tokenizer=AutoTokenizer.from_pretrained(
pretrained_model_name_or_path
))
filename = "audio.wav"
print(asr(filename))
```
Chunking file into smaller chunks is not implemented yet.### With package installation
```bash
pip install git+https://github.com/imvladikon/wav2vec2-hebrew
```#### Speech recognition
```python
from wav2vec2_hebrew import HebrewSpeechRecognitionPipelineasr = HebrewSpeechRecognitionPipeline()
filename = "./samples/bereshit011.wav"
output = asr(filename)
print(output)
# [{'text': 'בראשית ברא אלוהים את השמייים ואת הארץ'}]
```#### Alignment
```python
import torchaudio
from wav2vec2_hebrew import HebrewWav2Vec2Alignerfilename = "./samples/bereshit011.wav"
text = "בראשית ברא אלוהים את השמיים ואת הארץ"
aligner = HebrewWav2Vec2Aligner(input_sample_rate=16000, use_cuda=True)
# aligning segments to text (sentences)
first_sentence = aligner.align_data(filename, text)[0]
# {'sentence': 'בראשית ברא אלוהים את השמיים ואת הארץ',
# 'segments': [Segment(label='בראשית', start=6750.516853932584, end=18644.284644194755, score=0.16025335497152965)...]}# showing in IPython (notebook)
waveform, sample_rate = torchaudio.load(filename)
aligner.show_segments(waveform, first_sentence)
# showing segments using IPython.display.Audio
```## Training process
Training logs and details are available in the [train](train) folder.
### Datasets
* https://huggingface.co/datasets/imvladikon/hebrew_speech_kan
* https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera### Weights
* [imvladikon/wav2vec2-xls-r-300m-hebrew](https://huggingface.co/imvladikon/wav2vec2-xls-r-300m-hebrew)
* [imvladikon/wav2vec2-xls-r-300m-lm-hebrew](https://huggingface.co/imvladikon/wav2vec2-xls-r-300m-lm-hebrew)