https://github.com/jayuan101/transcript-agent
AI-powered transcription & interview analysis â 9 STT engines, 8 AI providers, always-on interview scoring
https://github.com/jayuan101/transcript-agent
ai anthropic audio gradio interview nlp openai python speech-to-text stt transcription whisper
Last synced: 11 days ago
JSON representation
AI-powered transcription & interview analysis â 9 STT engines, 8 AI providers, always-on interview scoring
- Host: GitHub
- URL: https://github.com/jayuan101/transcript-agent
- Owner: jayuan101
- Created: 2026-05-18T14:36:22.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-02T00:31:31.000Z (16 days ago)
- Last Synced: 2026-06-02T02:14:40.722Z (16 days ago)
- Topics: ai, anthropic, audio, gradio, interview, nlp, openai, python, speech-to-text, stt, transcription, whisper
- Language: Python
- Size: 646 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
---
title: Transcript Agent
emoji: đī¸
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# Transcript Agent
AI-powered transcription, interview scoring, and report generation â 9 STT engines à 8 AI providers.
**Bring your own API key.** Billed to your account, nothing stored on the server.
---
## Features
| | |
|---|---|
| đ¤ **9 STT engines** | Whisper (local/offline), OpenAI, Groq, Deepgram, AssemblyAI, Google, Azure, ElevenLabs, Rev.ai |
| đ¤ **8 AI providers** | Claude (Anthropic), OpenAI, Gemini, Groq, Mistral, Together AI, Perplexity, Ollama |
| đŖī¸ **37+ languages** | Auto-detect or select, with regional dialect variants and Indian language support |
| đ¯ **Interview Mode** | Always-on â scores every question Great / Good / Needs Improvement / Missed, 10-point overall score |
| đ **Deep Analysis** | Deflection rate, advancement likelihood %, coaching guide, prep tips |
| đ **Smart reports** | Summary, key points, action items, speaker profiles, speech analytics |
| đ **History tab** | Every session saved locally â tokens, cost, score, full Q&A replay |
| đ¤ **Exports** | .txt, .docx, .pdf, .srt subtitles, .vtt subtitles, .json |
| đ **Network monitor** | Always-live download/upload speed, animated bars, session totals |
| âąī¸ **ETA at every step** | Step tracker + time remaining for Loading, Extracting, Transcribing, and AI Analysis |
| âšī¸ **Stop & resume** | Cancel mid-job; re-submit the same file to resume from the saved transcript checkpoint |
---
## Supported formats
| Type | Formats |
|---|---|
| Audio | mp3, wav, m4a, flac, ogg, aac, wma |
| Video | mp4, mov, avi, mkv, webm |
| Docs | pdf, docx, txt, md, srt, vtt |
---
## Quick start
### Run locally (Python)
```bash
pip install gradio anthropic openai groq pdfplumber fpdf2 python-docx \
deepgram-sdk assemblyai elevenlabs rev_ai \
fastapi uvicorn python-multipart httpx requests
python app.py
# Opens http://localhost:7860
```
### Run with Docker
```bash
docker compose up
# or
docker run -p 7860:7860 ghcr.io/jayuan101/transcript-agent
```
### Windows desktop app
1. Download `TranscriptAgent-win64.zip` + `Install-TranscriptAgent.bat` from [Releases](https://github.com/jayuan101/transcript-agent/releases/latest)
2. Put both files in the same folder, double-click the `.bat`
3. It extracts, creates a Desktop shortcut, and launches automatically
### Mac desktop app
1. Download `TranscriptAgent.dmg` from [Releases](https://github.com/jayuan101/transcript-agent/releases/latest)
2. Open â drag to Applications â double-click to launch
---
## How to use
1. Enter your API key (Claude, OpenAI, Groq, etc.) in the sidebar
2. Choose your STT engine and AI provider
3. Upload a file or paste a URL / local path
4. Click **âļ Analyze**
Interview Mode is always active â every question in the audio is automatically scored and a coaching guide is generated.
---
## Architecture
```
app.py â Gradio UI, processing loop, all frontend logic
transcript_agent.py â STT dispatch, LLM analysis, report generation
launcher.py â PyInstaller entry point (opens browser on start)
```
---
## Releases
See [CHANGELOG.md](CHANGELOG.md) for full version history. Latest: **v1.1.10**