https://github.com/oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
https://github.com/oliverguhr/wav2vec2-live
asr pyaudio speech speech-recognition speech-to-text wav2vec wav2vec2
Last synced: 6 months ago
JSON representation
A live speech recognition using Facebooks wav2vec 2.0 model.
- Host: GitHub
- URL: https://github.com/oliverguhr/wav2vec2-live
- Owner: oliverguhr
- License: mit
- Created: 2021-04-15T09:34:05.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-02-04T15:32:23.000Z (over 1 year ago)
- Last Synced: 2024-11-05T13:42:44.169Z (11 months ago)
- Topics: asr, pyaudio, speech, speech-recognition, speech-to-text, wav2vec, wav2vec2
- Language: Python
- Homepage:
- Size: 2.84 MB
- Stars: 326
- Watchers: 7
- Forks: 57
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# automatic speech recognition with wav2vec2
Use any wav2vec model with a microphone.

## Setup
I recommend to install this project in a virtual environment.
```
python3 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
```Depending on linux distribution you might encounter an **error that portaudio was not found** when installing pyaudio. For Ubuntu you can solve that issue by installing the "portaudio19-dev" package.
```
sudo apt install portaudio19-dev
```Finally you can test the speech recognition:
```
python live_asr.py
```### Possible Issues:
* The code uses the systems default audio device. Please make sure that you set your systems default audio device correctly.
* "*attempt to connect to server failed*" you can safely ignore this message from pyaudio. It just means, that pyaudio can't connect to the jack audio server.
## Usage
You can use any **wav2vec2** model from the [huggingface model hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&search=wav2vec2). Just set the model name, all files will be downloaded on first execution.
```python
from live_asr import LiveWav2Vec2english_model = "facebook/wav2vec2-large-960h-lv60-self"
german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()try:
while True:
text,sample_length,inference_time = asr.get_last_text()
print(f"{sample_length:.3f}s"
+f"\t{inference_time:.3f}s"
+f"\t{text}")
except KeyboardInterrupt:
asr.stop()
```