{"id":13341857,"url":"https://github.com/EvilFreelancer/whisper-tests","last_synced_at":"2025-03-11T23:30:22.790Z","repository":{"id":193859933,"uuid":"689315224","full_name":"EvilFreelancer/whisper-tests","owner":"EvilFreelancer","description":"Collection of experiments on OpenAI Whisper models","archived":false,"fork":false,"pushed_at":"2023-09-10T17:26:40.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-09T15:15:34.305Z","etag":null,"topics":["api-server","docker-compose","testing","transcription","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvilFreelancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-09-09T12:24:39.000Z","updated_at":"2023-09-10T14:01:45.000Z","dependencies_parsed_at":"2023-09-10T13:44:26.739Z","dependency_job_id":"e3d4e2a1-48a0-48bd-9cfa-688a7885465c","html_url":"https://github.com/EvilFreelancer/whisper-tests","commit_stats":{"total_commits":10,"total_committers":2,"mean_commits":5.0,"dds":0.09999999999999998,"last_synced_commit":"99133692546910d83e6d53b035c0abf16d8a8d4a"},"previous_names":["evilfreelancer/whisper-tests"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fwhisper-tests","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fwhisper-tests/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fwhisper-tests/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fwhisper-tests/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvilFreelancer","download_url":"https://codeload.github.com/EvilFreelancer/whisper-tests/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243129486,"owners_count":20241025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-server","docker-compose","testing","transcription","whisper"],"created_at":"2024-07-29T19:26:40.431Z","updated_at":"2025-03-11T23:30:22.548Z","avatar_url":"https://github.com/EvilFreelancer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper Tests\n\nCollection of experiments on OpenAI Whisper models.\n\nTested on RTX 4090 24Gb.\n\n## Samples\n\n1. https://www.youtube.com/watch?v=UL7G4ugE8nU (ru)\n2. https://www.youtube.com/watch?v=w1u65BctsU4 (ru)\n3. https://www.youtube.com/watch?v=8qM-WESysZo (ru)\n4. https://www.youtube.com/watch?v=fAtXX-gsxl0 (ru)\n5. https://www.youtube.com/watch?v=F8UI4ek6ukc (ru)\n6. https://www.youtube.com/watch?v=u4RkkjiYu0k (en)\n7. https://www.youtube.com/watch?v=gggehz298L8 (en)\n8. https://www.youtube.com/watch?v=jCuEBVbmPcA (en)\n9. https://www.youtube.com/watch?v=wjO6OLmZB9A (en)\n10. https://www.youtube.com/watch?v=Jy6Qk_bO3Qw (en)\n\n## Tests results\n\nAccuracy is calculated as [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) ratio between\nreference and transcribed texts.\n\n### Reference tests (float32)\n\n* Engine: openai_whisper\n* Model: large-v2\n* Type: float32\n\n| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |\n|----|----------------|---------------------|------------------|\n| 1  | 823            | 80.51               | 1                |\n| 2  | 856            | 99.76               | 1                |\n| 3  | 416            | 45.68               | 1                |\n| 4  | 1390           | 127.46              | 1                |\n| 5  | 2205           | 233.90              | 1                |\n| 6  | 922            | 88.75               | 1                |\n| 7  | 1177           | 108.49              | 1                |\n| 8  | 1505           | 146.07              | 1                |\n| 9  | 1575           | 173.49              | 1                |\n| 10 | 1714           | 202.24              | 1                |\n\n* MAX VRAM used: 10.6Gb\n* AVG Transcribe Time: 132.5s\n\n### float16 (half)\n\n* Engine: faster_whisper\n* Model: large-v2\n* Type: float16\n\n| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |\n|----|----------------|---------------------|------------------|\n| 1  | 823            | 56.57               | 0.97             |\n| 2  | 856            | 51.79               | 0.95             |\n| 3  | 416            | 25.82               | 0.99             |\n| 4  | 1390           | 77.26               | 0.94             |\n| 5  | 2205           | 134.72              | 0.94             |\n| 6  | 922            | 45.24               | 0.93             |\n| 7  | 1177           | 64.26               | 0.99             |\n| 8  | 1505           | 89.33               | 0.97             |\n| 9  | 1575           | 99.32               | 0.96             |\n| 10 | 1714           | 116.59              | 0.98             |\n\n* MAX VRAM used: 8.41Gb\n* AVG Accuracy: 0.96\n* AVG Transcribe Time: 77.5s\n\n### int8\n\n* Engine: faster_whisper\n* Model: large-v2\n* Type: int8\n\n| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |\n|----|----------------|---------------------|------------------|\n| 1  | 823            | 30.88               | 0.97             |\n| 2  | 856            | 32.70               | 0.94             |\n| 3  | 416            | 16.21               | 0.99             |\n| 4  | 1390           | 48.94               | 0.93             |\n| 5  | 2205           | 85.69               | 0.94             |\n| 6  | 922            | 28.30               | 0.93             |\n| 7  | 1177           | 39.74               | 0.98             |\n| 8  | 1505           | 53.19               | 0.97             |\n| 9  | 1575           | 62.52               | 0.96             |\n| 10 | 1714           | 73.35               | 0.98             |\n\n* MAX VRAM used: 4.6Gb\n* AVG Accuracy: 0.96\n* AVG Transcribe Time: 46.5s\n\n### int4\n\n* Engine: faster_whisper\n* Model: large-v2\n* Type: int4\n\n| №  | Audio Time (s) | Transcribe Time (s) | Accuracy (ratio) |\n|----|----------------|---------------------|------------------|\n| 1  | 823            | 36.01               | 0.96             |\n| 2  | 856            | 39.24               | 0.94             |\n| 3  | 416            | 19.36               | 0.99             |\n| 4  | 1390           | 57.84               | 0.94             |\n| 5  | 2205           | 99.64               | 0.95             |\n| 6  | 922            | 37.69               | 0.93             |\n| 7  | 1177           | 52.48               | 0.98             |\n| 8  | 1505           | 71.51               | 0.97             |\n| 9  | 1575           | 80.40               | 0.96             |\n| 10 | 1714           | 91.19               | 0.98             |\n\n* MAX VRAM used: 3.9Gb\n* AVG Accuracy: 0.96\n* AVG Transcribe Time: 51.5s\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEvilFreelancer%2Fwhisper-tests","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEvilFreelancer%2Fwhisper-tests","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEvilFreelancer%2Fwhisper-tests/lists"}