An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with stt

A curated list of projects in awesome lists tagged with stt .

https://github.com/khoj-ai/khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

agent ai assistant chat chatgpt emacs image-generation llama3 llamacpp llm obsidian obsidian-md offline-llm productivity rag research self-hosted semantic-search stt whatsapp-ai

Last synced: 12 May 2025

https://github.com/snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

asr capitalization colab english german onnx pretrained-models pytorch repunctuation spanish speech speech-recognition speech-synthesis speech-to-text stt stt-benchmark text-to-speech torch-hub tts tts-models

Last synced: 13 May 2025

https://github.com/jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Last synced: 14 May 2025

https://github.com/coqui-ai/stt

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 14 May 2025

https://github.com/coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Last synced: 15 Mar 2025

https://github.com/pannous/tensorflow-speech-recognition

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

deep-learning neural-network speech-recognition speech-to-text stt tensorflow

Last synced: 15 May 2025

https://github.com/pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

ai audio-to-text golang speech-recognition speech-to-text stt subtitles sveltekit transcription ui web web-whisper webapp whisper

Last synced: 14 May 2025

https://github.com/lenml/speech-ai-forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

agent asr chattts chattts-forge chinese colab cosy-voice cosyvoice english firered fireredtts fish-speech gpt llama llm ssml stt text-to-speech tts whisper

Last synced: 15 May 2025

https://github.com/robitx/gp.nvim

Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI, Ollama, Anthropic, ..]

claude codeium copilot gemini gpt-4o gpt4o llm lua mistral neovim nvim ollama parrot perplexity sonnet speech-to-text stt vim voice whisper

Last synced: 14 May 2025

https://github.com/mkiol/dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

asr flatpak-applications linux-desktop machine-translation nmt offline sailfishos speech-recognition speech-synthesis speech-to-text stt text-to-speech translation translator tts

Last synced: 15 May 2025

https://github.com/lenML/Speech-AI-Forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

agent asr chattts chattts-forge chinese colab cosy-voice cosyvoice english firered fireredtts fish-speech gpt llama llm ssml stt text-to-speech tts whisper

Last synced: 19 Feb 2025

https://github.com/vrcwizard/tts-voice-wizard

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)

chatbox discord free heart-rate osc speech-recognition speech-to-text spotify stt text-to-speech tts voice vrchat vtuber

Last synced: 15 May 2025

https://github.com/VRCWizard/TTS-Voice-Wizard

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)

chatbox discord free heart-rate osc speech-recognition speech-to-text spotify stt text-to-speech tts voice vrchat vtuber

Last synced: 11 May 2025

https://github.com/evancohen/sonus

:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection

alexa hotword-detection keyword-spotting node speech speech-recognition speech-to-text stt voice-control voice-recognition

Last synced: 16 May 2025

https://github.com/lobehub/lobe-tts

🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser

auzre bun edge lobehub microsoft-speech-api nodejs opeanai react speech-recognition speech-to-text stt text-to-speech tts

Last synced: 14 May 2025

https://github.com/bbc/react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress

bbc-news-labs kaldi news-labs react stt textav transcript transcript-editor transcription

Last synced: 15 May 2025

https://github.com/macoron/whisper.unity

Running speech to text model (whisper.cpp) in Unity3d on your local machine.

asr openai speech-recognition speech-to-text stt unity3d whisper

Last synced: 15 May 2025

https://github.com/ccoreilly/vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

asr kaldi speech-recognition speech-to-text stt typescript vosk wasm webassembly

Last synced: 16 May 2025

https://github.com/starmoonai/starmoon

An open source voice-enabled, compact, empathic AI hardware + software 🤖 framework for companionship, entertainment, education, pediatric care, IoT robotics applications, AI-enhanced robotics application services, research, and DIY robotics kit development using Python, NextJs, Arduino, ESP32, LLMs (GPT), STT, TTS, Emotion Analysis, AI agent

esp32 gemini gpt iot llm robotics stt tts voice-assistant

Last synced: 08 Feb 2025

https://github.com/ikaros-521/realtimestt_llm_tts

实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果

llm python stt tts

Last synced: 16 May 2025

https://github.com/livekit-examples/kitt

Talk to ChatGPT in real time using LiveKit

ai assistant chatgpt gpt openai stt transcription translation tts voice webrtc

Last synced: 13 Apr 2025

https://github.com/nikdanilov/whisper-obsidian-plugin

Speech-to-text in Obsidian using OpenAI Whisper

obsidian openai-whisper speech-to-text stt transcribe voice whisper

Last synced: 04 Dec 2024

https://github.com/Ikaros-521/RealtimeSTT_LLM_TTS

实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果

llm python stt tts

Last synced: 24 Mar 2025

https://github.com/nrl-ai/customchar

Your customized AI assistant - Personal assistants on any hardware! With llama.cpp, whisper.cpp, ggml, LLaMA-v2.

cpp ggml llama llama-cpp llama-v2 llm stt tts whisper-cpp

Last synced: 22 Apr 2025

https://github.com/timmo001/home-assistant-assist-desktop

Use Home Assistant Assist on the desktop. Compatible with Windows, MacOS, and Linux

assist cross-platform desktop home-assistant home-assistant-assist stt svelte tauri tts

Last synced: 30 Apr 2025

https://github.com/gpustack/vox-box

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

asr audio-processing openai-compatible-api python speech-to-text stt text-to-speech tts

Last synced: 07 Apr 2025

https://github.com/tylike/ai.labs

openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel

chatgpt excel llamacpp llm openai-api stt tts whispercpp word

Last synced: 24 Jan 2025

https://github.com/tylike/AI.Labs

openai chatgpt or local llm(llama.cpp gguf format)+TTS+STT+Word+Excel

chatgpt excel llamacpp llm openai-api stt tts whispercpp word

Last synced: 06 Mar 2025

https://github.com/ai-adv-lab/deepspeech.mxnet

A MXNet implementation of Baidu's DeepSpeech architecture

arch baidu deepspeech mxnet speech speech-recognition speech-to-text stt warp-ctc

Last synced: 17 Apr 2025

https://github.com/ryanleary/patter

speech-to-text in pytorch

ocr pytorch rnn speech-recognition speech-to-text stt

Last synced: 21 Nov 2024

https://github.com/inevolin/discordearsbot

A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

discord discord-bot discord-js hearing-aids hearing-impaired speech speech-processing speech-recognition speech-synthesis speech-to-text stt

Last synced: 05 Apr 2025

https://github.com/am-sokolov/videodubber

The program for automatic dubbing any video file for a lot of languages.

asr dubbing stt translation video video-processing

Last synced: 11 Apr 2025

https://github.com/pinto0309/whisper-onnx-tensorrt

ONNX and TensorRT implementation of Whisper

cupy numpy onnx stt tensorrt whisper

Last synced: 30 Apr 2025

https://github.com/alxpez/alts

100% free, local & offline voice assistant with speech recognition

assistant chatbot llm local offline ollama speech-recognition stt tts voice voice-assistant whisper

Last synced: 06 Mar 2025

https://github.com/bbc/subtitles-generator

A node module to generate subtitles by segmenting a list of time-coded text - BBC News Labs

captions digital-paper-edit itt json news-labs newslabs premiere srt stt subtitles transcript-editor ttml vtt

Last synced: 06 Apr 2025

https://github.com/naeruru/mimiuchi

a free, customizable, osc capable speech-to-text interface for relaying text to different types of applications

osc speech-to-text stt translations tts vrchat vue vuetify

Last synced: 03 Apr 2025

https://github.com/abus-aikorea/kara-audio

Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.

asr demucs faster-whisper gradio karaoke mdx-net music-source-separation openai-whisper speech-recognition speech-to-text stt subtitle uvr vocal-remover webui whisper

Last synced: 25 Apr 2025

https://github.com/fabio-garavini/ha-openai-whisper-stt-api

HACS custom integration for using Whisper speech-to-text (OpenAI or GroqCloud) API in the Assist pipeline, reducing the workload on the Home Assistant server.

groq home-assistant openai stt whisper

Last synced: 09 Apr 2025

https://github.com/remixer-dec/botality-ii

telegram bot for self-hosted local inference of stable diffusion, text-to-speech and large language models, such as llama3

ai alpaca gpt-2 gpt-j llama llama3 llamacpp lora m1-mac mps multimodal self-hosted stable-diffusion stt telegram-bot tta tts

Last synced: 10 Feb 2025

https://github.com/nixon-voxell/UnityASR

Automatic Speech Recognition in Unity.

asr speech-recognition stt unity3d unity3d-speech

Last synced: 25 Apr 2025

https://github.com/palmerabollo/bingspeech-api-client

Microsoft Bing Speech API client in node.js

bing-speech speech-to-text stt text-to-speech tts

Last synced: 13 Apr 2025

https://github.com/voxell-tech/UnityASR

Automatic Speech Recognition in Unity.

asr speech-recognition stt unity3d unity3d-speech

Last synced: 04 Mar 2025

https://github.com/linto-ai/linto-studio

Transcription and annotation interface for recorded audio or video files

asr audio-transcription caption captioning-videos stt subtitle subtitles transcription-edition video-transcription virtual-scribe

Last synced: 12 Apr 2025

https://github.com/sberdevices/smartspeech

SmartSpeech — это сервис для синтеза и распознавания речи

api asr grpc stt tts

Last synced: 04 May 2025

https://github.com/kaloprojects/kalo-esp32-voice-assistant

Code snippets showing how to record I2S audio and store as .wav file on ESP32 with SD card, how to transcribe pre-recorded audio via Deepgram SpeechToText (STT) API, how to generate audio from text via TextToSpeech (TTS) API from OpenAI a/o SpeechGen a/o Google TTS. Triggering ESP32 actions via Voice.

audio deepgram deepgram-stt esp32 google-tts i2s i2s-audio i2s-microphone inmp441 is2-audio max98357 openai-tts recording sd-card speechgen speechgen-io speechtotext stt texttospeech tts

Last synced: 14 Apr 2025

https://github.com/jfversluis/mauispeechtotextsample

Sample code to demonstrate how to implement speech-to-text with .NET MAUI

android dotnet-maui ios maui sample-code speech-to-text stt windows

Last synced: 03 Dec 2024

https://github.com/kathyreid/opensource-voice-tools

A repo listing known open source voice tools, ordered by where they sit in the voice stack

asr chatbot conversational-ui corpus speech speech-recognition stt tts voice

Last synced: 13 Mar 2025

https://github.com/leafyeexyz/digitallife

一个具有长时记忆和 Live2d 形象的"数字生命" / A digital life with long-term memories and live2d body

ai ai-agents javascript live2d live2d-web memory openai psychology react stt tauri tts typescript web-api web-speech-api

Last synced: 12 May 2025

https://github.com/coqui-ai/stt-model-manager

Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zoo

coqui-ai flask python react speech-recognition stt websocket

Last synced: 23 Apr 2025

https://github.com/airmomo/videochat

VideoChat 是一款智能音视频内容解读助手,支持批量上传音视频文件并自动转录为文字。通过 AI 技术,它能快速生成内容总结、详细解读和思维导图,并提供智能对话功能,帮助用户更高效地理解和分析音视频内容。支持多种格式导出字幕文件。

llm stt video

Last synced: 29 Jan 2025

https://github.com/lgrammel/whisperwriter

Local & private voice controlled notepad using whisper.cpp

nextjs stt transcription vad whisper-cpp

Last synced: 05 May 2025

https://github.com/kiritoind/personal-voice-assistant-using-llm-functioncalling

The Personal Voice Assistant is a sophisticated AI-driven tool designed to interact with users through natural language. Leveraging a state-of-the-art language model (LLM), this assistant provides a seamless and intuitive experience by understanding and executing functions.

groq-api llama3 llm python speech-recognition stt tts voice-assistant

Last synced: 10 Feb 2025

https://github.com/ovidijusparsiunas/speech-to-element

A simple way to add speech to text functionality to your website :microphone:

azure cognitive-services element input real-time realtime speech speech-recognition speech-to-text stt webspeech webspeech-api

Last synced: 09 Apr 2025

https://github.com/inevolin/discordspeechbot

A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.

discord discord-bot discord-js music music-player speech speech-processing speech-recognition speech-to-text stt

Last synced: 01 May 2025

https://github.com/openvoiceos/ovos-plugin-manager

plugin manager for OpenVoiceOS , STT/TTS/Wakewords that can be used anywhere

hacktoberfest mycroft openvoiceos ovos plugin python stt tts wake-words

Last synced: 09 Apr 2025

https://github.com/analyticsinmotion/werpy

🐍📦 Rapidly calculate and analyze the Word Error Rate (WER) with this powerful yet lightweight Python package.

asr asr-evaluation automatic-speech-recognition levenshtein-distance metrics nlp python python-package speech-to-text stt stt-benchmark wer werpy word-error-rate

Last synced: 07 Apr 2025

https://github.com/bbc/stt-align-node

node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R&D.

alignement labs news-labs newslabs re-alignement stt

Last synced: 06 Apr 2025

https://github.com/OpenVoiceOS/OVOS-plugin-manager

plugin manager for OpenVoiceOS , STT/TTS/Wakewords that can be used anywhere

hacktoberfest mycroft openvoiceos ovos plugin python stt tts wake-words

Last synced: 10 May 2025

https://github.com/openvoiceos/ovos-docker-stt

Open Voice OS Speech-to-Text (STT) container images and docker-compose.yml file for x86_64 CPU architecture.

fasterwhisper openvoiceos ovos speech-to-text stt whisper

Last synced: 16 May 2025

https://github.com/hay/audio2text

Python command line utility wrappers for Whispercpp and other speech-to-text utilities

speech-recognition speech-to-text stt whisper whisper-cpp

Last synced: 14 Apr 2025

https://github.com/zevaverbach/tatt

Transcribe All The Things™ is a CLI for creating and managing speech-to-text transcripts.

amazon-transcribe-api asr automatic-speech-recognition cli speech-to-text stt

Last synced: 13 Apr 2025

https://github.com/deepgram/deepgram-js-captions

This package is the JavaScript implementation of Deepgram's WebVTT and SRT formatting. Given a transcription, this package can return a valid string to store as WebVTT or SRT caption files.

asr audio closed-captions deepgram ffmpeg javascript sdk speech speech-to-text srt stt subtitles transcription typescript webvtt youtube

Last synced: 23 Nov 2024

https://github.com/abus-aikorea/studio-free

youtube download, vocal remover, vocal extraction, karaoke video production, STT, automatic speech recognition, transcription, automatic subtitle, AI, yt-dlp, demucs, whisper, webui, gradio, windows

ai automatic-speech-recognition automatic-subtitle demucs gradio karaoke openai stt transcription video-download vocal-remover webui whisper windows yt-dlp

Last synced: 25 Apr 2025

https://github.com/coderscreative/faster-whisper-rs

a rust crate for easily implementing faster-whisper stt into your rust programs.

ai faster-whisper rust speech-recognition speech-to-text stt whisper

Last synced: 08 Feb 2025

https://github.com/kkdai/linebot-video-gcp

A LINE Bot with Google Cloud Storage and Speech-To-Text features written in Go. Full article https://www.evanlin.com/til-gcp-speech-video-flex/

gcp gcs golang linebot stt

Last synced: 07 May 2025

https://github.com/jarbasal/pocketsphinx-models-mirror

pocketsphinx models for languages originating from the iberian peninsula

asr automatic-speech-recognition pocketsphinx speech-recognition speech-to-text stt stt-models

Last synced: 19 Feb 2025

https://github.com/stawa/gtts

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.

ai gemini google-gemini stt tts typescript

Last synced: 17 Mar 2025

https://github.com/mathquis/node-kaldi-online-nnet3-decoder

ASR online decoding using Kaldi NNet3 GrammarFST

asr decoder kaldi nnet3 stt

Last synced: 05 May 2025

https://github.com/harmindersinghnijjar/streamlit-punjabi-ai

Punjabi AI, ChatGPT with translation and Punjabi TTS using Narakeet's API.

appathon-streamlit gpt3 gpt4 hackathon-project punjab punjabi punjabi-chat streamlit stt tts

Last synced: 09 Apr 2025

https://github.com/msub2/sepia-speechrecognition-polyfill

A polyfill for SpeechRecognition built to function with a SEPIA STT server.

polyfill sepia speech-recognition speech-to-text stt webspeech webspeech-api

Last synced: 06 Dec 2024

https://github.com/shadowlp174/discord-stt

A Node.JS module for speech to text transcription in Discord voice channels using wit.ai.

discord-js discordjs module natural-language-processing nodejs speech-recognition speech-to-text stt wit-ai witai

Last synced: 10 Apr 2025

https://github.com/cycle-sync-ai/livekit-voice-ai-agent-setup

This is the guide to show the method to build your own AI-Powered voice agent with LiveKit and Twillio

agent ai assistant deepgram elevenlabs livekit openai phone pstn python realtime-chat realtime-messaging sip stt tts twilio voice websocket

Last synced: 09 Apr 2025

https://github.com/assemblyai/assemblyai-java-sdk

The AssemblyAI Java SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.

ai asr assemblyai java llm speech-to-text stt transcription

Last synced: 16 Mar 2025

https://github.com/inevolin/discordearsgo

A speech-to-text framework and bot for Discord written in GoLang. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.

audio-processing discord speech-recognition speech-to-text stt

Last synced: 11 Jan 2025

https://github.com/assemblyai/assemblyai-ruby-sdk

The AssemblyAI Ruby SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.

ai asr assemblyai llm ruby speech-to-text stt transcription

Last synced: 12 Apr 2025

https://github.com/netherquartz/textforspeechnormalizer

A Python library to accentuate Russian text

accentuation asr nlp python russian russian-language stt tts

Last synced: 09 Apr 2025