Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/malolm/whisper-3-speech-to-text
A simple python program for audio files transcription using Whisper model.
https://github.com/malolm/whisper-3-speech-to-text
Last synced: about 1 month ago
JSON representation
A simple python program for audio files transcription using Whisper model.
- Host: GitHub
- URL: https://github.com/malolm/whisper-3-speech-to-text
- Owner: MaloLM
- License: apache-2.0
- Created: 2024-11-18T15:13:51.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-29T18:55:46.000Z (about 2 months ago)
- Last Synced: 2024-12-12T08:43:20.308Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 398 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# whisper-v3-speech-to-text
[![release-version](https://img.shields.io/badge/Version-1.0.1-blue)]()
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/MaloLM/whisper-3-large-speach-to-text/blob/main/LICENSE)
[![language](https://img.shields.io/badge/Language-Python-blue)](https://www.python.org)A simple python program for audio files transcription using Whisper model.
> ⚠️ [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) is a large model (FP32, relative to hardware). > This version is not designed for real-time TTS although. It is still possible to adapt it with faster and less efficient transcription models.
> For example In the whisper.py file, replace atribute `self.model_id` with the value `"openai/whisper-large-v3-turbo"`.
### Whisper models:
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
| ------ | ---------- | ------------------ | ------------------ | ------------- | -------------- |
| tiny | 39 M | tiny.en | tiny | ~1 GB | ~10x |
| base | 74 M | base.en | base | ~1 GB | ~7x |
| small | 244 M | small.en | small | ~2 GB | ~4x |
| medium | 769 M | medium.en | medium | ~5 GB | ~2x |
| large | 1550 M | N/A | large | ~10 GB | 1x |
| turbo | 809 M | N/A | turbo | ~6 GB | ~8x |## How to use
> `python main.py /inputs_directory output_directory language`
> `python main.py input_file output_directory language`
### Compatible languages
```
english, chinese, german, spanish, russian, korean, french, japanese, portuguese, turkish, polish, catalan, dutch, arabic, swedish, italian, indonesian, hindi, finnish, vietnamese, hebrew, ukrainian, greek, malay, czech, romanian, danish, hungarian, tamil, norwegian, thai, urdu, croatian, bulgarian, lithuanian, latin, maori, malayalam, welsh, slovak, telugu, persian, latvian, bengali, serbian, azerbaijani, slovenian, kannada, estonian, macedonian, breton, basque, icelandic, armenian, nepali, mongolian, bosnian,
kazakh, albanian, swahili, galician, marathi, punjabi, sinhala, khmer, shona, yoruba, somali, afrikaans, occitan, georgian, belarusian, tajik, sindhi, gujarati, amharic, yiddish, lao, uzbek, faroese, haitian creole, pashto, turkmen, nynorsk, maltese, sanskrit, luxembourgish, myanmar, tibetan, tagalog, malagasy, assamese, tatar, hawaiian, lingala, hausa, bashkir, javanese, sundanese, cantonese, burmese, valencian, flemish, haitian, letzeburgesch, pushto, panjabi, moldavian, moldovan, sinhalese, castilian, mandarin
```## Software requirements for the code to work
- ⚠️ [ffmpeg](https://ffmpeg.org) v7.1
- Python `3.10.11` in my case, with following requirements:```
torch==2.5.1
accelerate==1.1.1
transformers==4.46.2
datasets==3.10
```## Tested Environments
| Tested | OS | Version | Architecture |
| ------ | ------ | --------------- | ------------ |
| ✅ | macOS | 15.0.1 (24A348) | aarch64 |
| ❌ | Ubuntu | 20.04 LTS | x86_64 |
| ❌ | WSL | Ubuntu 20.04 | x86_64 |