Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mikeesto/gemini-transcribe

Transcribe audio and video files with speaker diarization and logically grouped timestamps
https://github.com/mikeesto/gemini-transcribe

gemini-flash speaker-diarization speech-to-text sveltekit transcription

Last synced: 12 days ago
JSON representation

Transcribe audio and video files with speaker diarization and logically grouped timestamps

Awesome Lists containing this project

README

        

# Gemini Transcribe

[https://gemini-transcribe.fly.dev/](https://gemini-transcribe.fly.dev/)

A web application for transcribing audio and video files using Google's Gemini Flash model.

Flash is a very interesting model to explore for audio transcription because:

- We can prompt for specific transcription outputs, as it processes both audio and text inputs
- It has built-in speaker diarization
- It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
- It can translate the transcription, in particular to languages other than English

Google claims Flash's word error rate is 9.6% in the FLEURS benchmark (September, 2024).