Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/evilfreelancer/whisper-tests
Collection of experiments on OpenAI Whisper models
https://github.com/evilfreelancer/whisper-tests
api-server docker-compose testing transcription whisper
Last synced: 16 days ago
JSON representation
Collection of experiments on OpenAI Whisper models
- Host: GitHub
- URL: https://github.com/evilfreelancer/whisper-tests
- Owner: EvilFreelancer
- License: mit
- Created: 2023-09-09T12:24:39.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-10T17:26:40.000Z (about 1 year ago)
- Last Synced: 2024-10-24T10:07:46.842Z (21 days ago)
- Topics: api-server, docker-compose, testing, transcription, whisper
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper Tests
Collection of experiments on OpenAI Whisper models.
Tested on RTX 4090 24Gb.
## Samples
1. https://www.youtube.com/watch?v=UL7G4ugE8nU (ru)
2. https://www.youtube.com/watch?v=w1u65BctsU4 (ru)
3. https://www.youtube.com/watch?v=8qM-WESysZo (ru)
4. https://www.youtube.com/watch?v=fAtXX-gsxl0 (ru)
5. https://www.youtube.com/watch?v=F8UI4ek6ukc (ru)
6. https://www.youtube.com/watch?v=u4RkkjiYu0k (en)
7. https://www.youtube.com/watch?v=gggehz298L8 (en)
8. https://www.youtube.com/watch?v=jCuEBVbmPcA (en)
9. https://www.youtube.com/watch?v=wjO6OLmZB9A (en)
10. https://www.youtube.com/watch?v=Jy6Qk_bO3Qw (en)## Tests results
Accuracy is calculated as [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) ratio between
reference and transcribed texts.### Reference tests (float32)
* Engine: openai_whisper
* Model: large-v2
* Type: float32| № | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |
|----|----------------|---------------------|------------------|
| 1 | 823 | 80.51 | 1 |
| 2 | 856 | 99.76 | 1 |
| 3 | 416 | 45.68 | 1 |
| 4 | 1390 | 127.46 | 1 |
| 5 | 2205 | 233.90 | 1 |
| 6 | 922 | 88.75 | 1 |
| 7 | 1177 | 108.49 | 1 |
| 8 | 1505 | 146.07 | 1 |
| 9 | 1575 | 173.49 | 1 |
| 10 | 1714 | 202.24 | 1 |* MAX VRAM used: 10.6Gb
* AVG Transcribe Time: 132.5s### float16 (half)
* Engine: faster_whisper
* Model: large-v2
* Type: float16| № | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |
|----|----------------|---------------------|------------------|
| 1 | 823 | 56.57 | 0.97 |
| 2 | 856 | 51.79 | 0.95 |
| 3 | 416 | 25.82 | 0.99 |
| 4 | 1390 | 77.26 | 0.94 |
| 5 | 2205 | 134.72 | 0.94 |
| 6 | 922 | 45.24 | 0.93 |
| 7 | 1177 | 64.26 | 0.99 |
| 8 | 1505 | 89.33 | 0.97 |
| 9 | 1575 | 99.32 | 0.96 |
| 10 | 1714 | 116.59 | 0.98 |* MAX VRAM used: 8.41Gb
* AVG Accuracy: 0.96
* AVG Transcribe Time: 77.5s### int8
* Engine: faster_whisper
* Model: large-v2
* Type: int8| № | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |
|----|----------------|---------------------|------------------|
| 1 | 823 | 30.88 | 0.97 |
| 2 | 856 | 32.70 | 0.94 |
| 3 | 416 | 16.21 | 0.99 |
| 4 | 1390 | 48.94 | 0.93 |
| 5 | 2205 | 85.69 | 0.94 |
| 6 | 922 | 28.30 | 0.93 |
| 7 | 1177 | 39.74 | 0.98 |
| 8 | 1505 | 53.19 | 0.97 |
| 9 | 1575 | 62.52 | 0.96 |
| 10 | 1714 | 73.35 | 0.98 |* MAX VRAM used: 4.6Gb
* AVG Accuracy: 0.96
* AVG Transcribe Time: 46.5s### int4
* Engine: faster_whisper
* Model: large-v2
* Type: int4| № | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |
|----|----------------|---------------------|------------------|
| 1 | 823 | 36.01 | 0.96 |
| 2 | 856 | 39.24 | 0.94 |
| 3 | 416 | 19.36 | 0.99 |
| 4 | 1390 | 57.84 | 0.94 |
| 5 | 2205 | 99.64 | 0.95 |
| 6 | 922 | 37.69 | 0.93 |
| 7 | 1177 | 52.48 | 0.98 |
| 8 | 1505 | 71.51 | 0.97 |
| 9 | 1575 | 80.40 | 0.96 |
| 10 | 1714 | 91.19 | 0.98 |* MAX VRAM used: 3.9Gb
* AVG Accuracy: 0.96
* AVG Transcribe Time: 51.5s