https://github.com/shivxmr/speech-diarization

Speech Diarization
https://github.com/shivxmr/speech-diarization

diarization python speech-recognition whisper

Last synced: about 1 month ago
JSON representation

Speech Diarization

Host: GitHub
URL: https://github.com/shivxmr/speech-diarization
Owner: shivxmr
Created: 2025-07-24T18:21:54.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-07-25T03:31:04.000Z (10 months ago)
Last Synced: 2025-08-08T08:02:16.837Z (10 months ago)
Topics: diarization, python, speech-recognition, whisper
Language: Jupyter Notebook
Homepage:
Size: 2.06 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Agentic Pipeline for Speaker Diarization and Quality Check

This notebook implements a Python pipeline for generating subtitles from audio files, featuring speaker diarization and a quality check agent. It's useful for analyzing short clips with multiple speakers, producing SRT files with labeled dialogues.

## Features
- Speaker diarization with consistent labeling.
- Dialogue-level transcription.
- Confidence-based quality evaluation per segment.

## Approach
1. **Audio Preparation**: Standardize audio to mono PCM WAV.
2. **Transcription**: Leverage Whisper for accurate segmenting.
3. **Diarization**: Use pyannote embeddings and clustering for speaker assignment—more reliable than basic methods.
4. **Merging**: Combine segments for natural flow.
5. **Output**: SRT with labels.
6. **Quality Agent**: Rule-based confidence scoring and feedback.

## Pipeline Flow
Input audio → Convert → Transcribe → Embed & Cluster → Merge → SRT & Quality Report.

## Limitations and Improvements
- Limitations: Fixed speaker count; best on clean audio.
- Improvements: Add overlap detection; LLM feedback; auto-speaker estimation.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shivxmr/speech-diarization

Awesome Lists containing this project

README