Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mikeesto/gemini-transcribe
Transcribe audio and video files with speaker diarization and logically grouped timestamps
https://github.com/mikeesto/gemini-transcribe
gemini-flash speaker-diarization speech-to-text sveltekit transcription
Last synced: 12 days ago
JSON representation
Transcribe audio and video files with speaker diarization and logically grouped timestamps
- Host: GitHub
- URL: https://github.com/mikeesto/gemini-transcribe
- Owner: mikeesto
- Created: 2024-08-15T08:10:48.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2024-10-30T04:50:43.000Z (14 days ago)
- Last Synced: 2024-10-30T06:26:54.661Z (14 days ago)
- Topics: gemini-flash, speaker-diarization, speech-to-text, sveltekit, transcription
- Language: TypeScript
- Homepage: https://gemini-transcribe.fly.dev/
- Size: 1.68 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gemini Transcribe
[https://gemini-transcribe.fly.dev/](https://gemini-transcribe.fly.dev/)
A web application for transcribing audio and video files using Google's Gemini Flash model.
Flash is a very interesting model to explore for audio transcription because:
- We can prompt for specific transcription outputs, as it processes both audio and text inputs
- It has built-in speaker diarization
- It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
- It can translate the transcription, in particular to languages other than EnglishGoogle claims Flash's word error rate is 9.6% in the FLEURS benchmark (September, 2024).