https://github.com/oliverguhr/wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.
https://github.com/oliverguhr/wav2vec2-live

asr pyaudio speech speech-recognition speech-to-text wav2vec wav2vec2

Last synced: 5 months ago
JSON representation

A live speech recognition using Facebooks wav2vec 2.0 model.

Host: GitHub
URL: https://github.com/oliverguhr/wav2vec2-live
Owner: oliverguhr
License: mit
Created: 2021-04-15T09:34:05.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2024-02-04T15:32:23.000Z (over 2 years ago)
Last Synced: 2025-04-05T09:34:08.810Z (about 1 year ago)
Topics: asr, pyaudio, speech, speech-recognition, speech-to-text, wav2vec, wav2vec2
Language: Python
Homepage:
Size: 2.84 MB
Stars: 348
Watchers: 6
Forks: 56
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# automatic speech recognition with wav2vec2

Use any wav2vec model with a microphone.

![demo gif](./docs/wav2veclive.gif)

## Setup

I recommend to install this project in a virtual environment.

```
python3 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
```

Depending on linux distribution you might encounter an **error that portaudio was not found** when installing pyaudio. For Ubuntu you can solve that issue by installing the "portaudio19-dev" package.

```
sudo apt install portaudio19-dev
```

Finally you can test the speech recognition:

```
python live_asr.py
```

### Possible Issues:

* The code uses the systems default audio device. Please make sure that you set your systems default audio device correctly.

* "*attempt to connect to server failed*" you can safely ignore this message from pyaudio. It just means, that pyaudio can't connect to the jack audio server.

## Usage

You can use any **wav2vec2** model from the [huggingface model hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&search=wav2vec2). Just set the model name, all files will be downloaded on first execution.

```python
from live_asr import LiveWav2Vec2

english_model = "facebook/wav2vec2-large-960h-lv60-self"
german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()

try:
while True:
text,sample_length,inference_time = asr.get_last_text()
print(f"{sample_length:.3f}s"
+f"\t{inference_time:.3f}s"
+f"\t{text}")

except KeyboardInterrupt:
asr.stop()
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oliverguhr/wav2vec2-live

Awesome Lists containing this project

README