https://github.com/shivxmr/speech-diarization
Speech Diarization
https://github.com/shivxmr/speech-diarization
diarization python speech-recognition whisper
Last synced: about 1 month ago
JSON representation
Speech Diarization
- Host: GitHub
- URL: https://github.com/shivxmr/speech-diarization
- Owner: shivxmr
- Created: 2025-07-24T18:21:54.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-07-25T03:31:04.000Z (10 months ago)
- Last Synced: 2025-08-08T08:02:16.837Z (10 months ago)
- Topics: diarization, python, speech-recognition, whisper
- Language: Jupyter Notebook
- Homepage:
- Size: 2.06 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Agentic Pipeline for Speaker Diarization and Quality Check
This notebook implements a Python pipeline for generating subtitles from audio files, featuring speaker diarization and a quality check agent. It's useful for analyzing short clips with multiple speakers, producing SRT files with labeled dialogues.
## Features
- Speaker diarization with consistent labeling.
- Dialogue-level transcription.
- Confidence-based quality evaluation per segment.
## Approach
1. **Audio Preparation**: Standardize audio to mono PCM WAV.
2. **Transcription**: Leverage Whisper for accurate segmenting.
3. **Diarization**: Use pyannote embeddings and clustering for speaker assignment—more reliable than basic methods.
4. **Merging**: Combine segments for natural flow.
5. **Output**: SRT with labels.
6. **Quality Agent**: Rule-based confidence scoring and feedback.
## Pipeline Flow
Input audio → Convert → Transcribe → Embed & Cluster → Merge → SRT & Quality Report.
## Limitations and Improvements
- Limitations: Fixed speaker count; best on clean audio.
- Improvements: Add overlap detection; LLM feedback; auto-speaker estimation.