Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/evilfreelancer/whisper-tests

Collection of experiments on OpenAI Whisper models
https://github.com/evilfreelancer/whisper-tests

api-server docker-compose testing transcription whisper

Last synced: about 1 month ago
JSON representation

Collection of experiments on OpenAI Whisper models

Host: GitHub
URL: https://github.com/evilfreelancer/whisper-tests
Owner: EvilFreelancer
License: mit
Created: 2023-09-09T12:24:39.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-09-10T17:26:40.000Z (over 1 year ago)
Last Synced: 2024-10-24T10:07:46.842Z (3 months ago)
Topics: api-server, docker-compose, testing, transcription, whisper
Language: Python
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Whisper Tests

Collection of experiments on OpenAI Whisper models.

Tested on RTX 4090 24Gb.

## Samples

1. https://www.youtube.com/watch?v=UL7G4ugE8nU (ru)

2. https://www.youtube.com/watch?v=w1u65BctsU4 (ru)

3. https://www.youtube.com/watch?v=8qM-WESysZo (ru)

4. https://www.youtube.com/watch?v=fAtXX-gsxl0 (ru)

5. https://www.youtube.com/watch?v=F8UI4ek6ukc (ru)

6. https://www.youtube.com/watch?v=u4RkkjiYu0k (en)

7. https://www.youtube.com/watch?v=gggehz298L8 (en)

8. https://www.youtube.com/watch?v=jCuEBVbmPcA (en)

9. https://www.youtube.com/watch?v=wjO6OLmZB9A (en)

10. https://www.youtube.com/watch?v=Jy6Qk_bO3Qw (en)

## Tests results

Accuracy is calculated as [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) ratio between

reference and transcribed texts.

### Reference tests (float32)

* Engine: openai_whisper

* Model: large-v2

* Type: float32

| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |

|----|----------------|---------------------|------------------|

| 1  | 823            | 80.51               | 1                |

| 2  | 856            | 99.76               | 1                |

| 3  | 416            | 45.68               | 1                |

| 4  | 1390           | 127.46              | 1                |

| 5  | 2205           | 233.90              | 1                |

| 6  | 922            | 88.75               | 1                |

| 7  | 1177           | 108.49              | 1                |

| 8  | 1505           | 146.07              | 1                |

| 9  | 1575           | 173.49              | 1                |

| 10 | 1714           | 202.24              | 1                |

* MAX VRAM used: 10.6Gb

* AVG Transcribe Time: 132.5s

### float16 (half)

* Engine: faster_whisper

* Model: large-v2

* Type: float16

| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |

|----|----------------|---------------------|------------------|

| 1  | 823            | 56.57               | 0.97             |

| 2  | 856            | 51.79               | 0.95             |

| 3  | 416            | 25.82               | 0.99             |

| 4  | 1390           | 77.26               | 0.94             |

| 5  | 2205           | 134.72              | 0.94             |

| 6  | 922            | 45.24               | 0.93             |

| 7  | 1177           | 64.26               | 0.99             |

| 8  | 1505           | 89.33               | 0.97             |

| 9  | 1575           | 99.32               | 0.96             |

| 10 | 1714           | 116.59              | 0.98             |

* MAX VRAM used: 8.41Gb

* AVG Accuracy: 0.96

* AVG Transcribe Time: 77.5s

### int8

* Engine: faster_whisper

* Model: large-v2

* Type: int8

| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |

|----|----------------|---------------------|------------------|

| 1  | 823            | 30.88               | 0.97             |

| 2  | 856            | 32.70               | 0.94             |

| 3  | 416            | 16.21               | 0.99             |

| 4  | 1390           | 48.94               | 0.93             |

| 5  | 2205           | 85.69               | 0.94             |

| 6  | 922            | 28.30               | 0.93             |

| 7  | 1177           | 39.74               | 0.98             |

| 8  | 1505           | 53.19               | 0.97             |

| 9  | 1575           | 62.52               | 0.96             |

| 10 | 1714           | 73.35               | 0.98             |

* MAX VRAM used: 4.6Gb

* AVG Accuracy: 0.96

* AVG Transcribe Time: 46.5s

### int4

* Engine: faster_whisper

* Model: large-v2

* Type: int4

| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |

|----|----------------|---------------------|------------------|

| 1  | 823            | 36.01               | 0.96             |

| 2  | 856            | 39.24               | 0.94             |

| 3  | 416            | 19.36               | 0.99             |

| 4  | 1390           | 57.84               | 0.94             |

| 5  | 2205           | 99.64               | 0.95             |

| 6  | 922            | 37.69               | 0.93             |

| 7  | 1177           | 52.48               | 0.98             |

| 8  | 1505           | 71.51               | 0.97             |

| 9  | 1575           | 80.40               | 0.96             |

| 10 | 1714           | 91.19               | 0.98             |

* MAX VRAM used: 3.9Gb

* AVG Accuracy: 0.96

* AVG Transcribe Time: 51.5s