Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mikeesto/gemini-transcribe

Transcribe audio and video files with speaker diarization and logically grouped timestamps
https://github.com/mikeesto/gemini-transcribe

gemini-flash speaker-diarization speech-to-text sveltekit transcription

Last synced: 12 days ago
JSON representation

Transcribe audio and video files with speaker diarization and logically grouped timestamps

Host: GitHub
URL: https://github.com/mikeesto/gemini-transcribe
Owner: mikeesto
Created: 2024-08-15T08:10:48.000Z (3 months ago)
Default Branch: master
Last Pushed: 2024-10-30T04:50:43.000Z (14 days ago)
Last Synced: 2024-10-30T06:26:54.661Z (14 days ago)
Topics: gemini-flash, speaker-diarization, speech-to-text, sveltekit, transcription
Language: TypeScript
Homepage: https://gemini-transcribe.fly.dev/
Size: 1.68 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Gemini Transcribe

[https://gemini-transcribe.fly.dev/](https://gemini-transcribe.fly.dev/)

A web application for transcribing audio and video files using Google's Gemini Flash model.

Flash is a very interesting model to explore for audio transcription because:

- We can prompt for specific transcription outputs, as it processes both audio and text inputs
- It has built-in speaker diarization
- It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
- It can translate the transcription, in particular to languages other than English

Google claims Flash's word error rate is 9.6% in the FLEURS benchmark (September, 2024).