Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aeronjl/transcribe

Python package for accurate audio transcription with speaker diarisation
https://github.com/aeronjl/transcribe

audio-transcription gpt speaker-diarization whisper

Last synced: 11 days ago
JSON representation

Python package for accurate audio transcription with speaker diarisation

Host: GitHub
URL: https://github.com/aeronjl/transcribe
Owner: aeronjl
License: other
Created: 2024-06-12T18:17:04.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-07-01T12:25:28.000Z (8 months ago)
Last Synced: 2025-01-01T10:50:17.454Z (about 2 months ago)
Topics: audio-transcription, gpt, speaker-diarization, whisper
Language: Python
Homepage:
Size: 25.4 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Transcription with OpenAI's [Whisper](https://github.com/openai/whisper) is very accurate, but it doesn't natively support speaker labelling (diarisation).

Existing libraries for diarisation like [pyannote](https://github.com/pyannote/pyannote-audio) rely on audio features to separate and identify speakers, but are computationally expensive and often inaccurate. A common failure mode arises when the speaker changes their audio quality, such as when they move closer to or further from the microphone. This can cause the diarisation algorithm to incorrectly identify the speaker as a new person.

I had a simple hypothesis: the cues from transcribed speech are sufficient to identify speakers. I developed a pipeline which passes the transcribed text to GPT-4o with a prompt asking it to identify the speaker.

```mermaid
flowchart LR
A[Input file #40;audio, video#41;]
B[Whisper transcription]
C[Text output]
D[Label and tidy with GPT-4o]
E[Output in user-defined format]
A-- Convert to WAV -->B
B-->C
C-->D
D-->C
D-->E
```
# Features
- Transcribe long media files with Whisper faster using parallel processing.
- Automatically label speakers in the transcription using GPT-4o.
- Tidy up the transcription by removing filler words and false starts.
- Generate timestamped transcripts in plaintext and JSON format.

# Installation

```bash
pip install precisetranscribe
```

Ensure `OPENAI_API_KEY` is set in your environment variables. Then simply run:

```python
import precisetranscribe as pt

```