https://github.com/deepgram/deepgram-eos-heuristics
Reference implementations for robust end-of-speech detection using Deepgram's real-time API and custom local heuristics
https://github.com/deepgram/deepgram-eos-heuristics
Last synced: 5 months ago
JSON representation
Reference implementations for robust end-of-speech detection using Deepgram's real-time API and custom local heuristics
- Host: GitHub
- URL: https://github.com/deepgram/deepgram-eos-heuristics
- Owner: deepgram
- Created: 2024-10-04T17:56:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-30T20:32:01.000Z (about 1 year ago)
- Last Synced: 2026-01-18T21:30:06.095Z (5 months ago)
- Size: 19.5 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# Deepgram Speech Segmentation Heuristics
This repository contains reference implementations for robust end-of-speech detection using a combination of Deepgram's real-time transcription API features and custom heuristics. The goal is to demonstrate how low-latency, real-time solutions can be created using the Deepgram API, open-source Voice Activity Detection (VAD), and tailored heuristics.
## Overview
The project showcases how to combine various Deepgram API features with local processing to achieve accurate and low-latency utterance segmentation. It utilizes:
- Deepgram API features:
- Endpointing
- Utterance End
- Word-level timestamps
- Local Voice Activity Detection (VAD) using [`silero-vad`](https://github.com/snakers4/silero-vad)
- Custom heuristics for speech detection and endpointing
The system is designed to be modular, allowing for easy addition and modification of different event handlers and heuristics.
## Project Structure
The repository is organized as follows:
```
project_root/
│
├── base_heuristic.py
├── vad.py
├── examples
│ └── vad_implementation
│ ├── heuristic.py
│ ├── terminal_renderer.py
│ ├── main.py
│ └── README.md
├── requirements.txt
└── README.md
```
- `base_heuristic.py`: Contains the base `Heuristic` class for implementing custom logic.
- `vad.py`: Implements Voice Activity Detection using silero-vad.
- `examples/`: Contains different implementation examples.
- `vad_implementation/`: An example implementation using VAD and custom heuristics.
## Getting Started
To use any of the reference implementations:
1. Navigate to the specific example folder (e.g., `examples/vad_implementation/`).
2. Follow the README instructions in that folder for setup and execution.
Each example folder contains its own `requirements.txt` file and specific instructions for running the implementation.
## Examples
### VAD Implementation
The VAD implementation demonstrates how to combine Deepgram's real-time transcription with local Voice Activity Detection for advanced end-of-speech detection. It showcases the use of `silero-vad` alongside Deepgram's API features.
For more details, see the README in the `examples/vad_implementation/` folder.
## Future Examples
While the current VAD implementation represents our recommended approach for robust, low-latency speech detection, we plan to add the following examples:
- A web app implementation demonstrating the VAD approach with a simple web frontend
- Examples showing heuristic approaches to end-of-speech detection without using a local VAD, relying solely on analysis of transcript results
It's important to note that for the most reliable and low-latency performance, we recommend using a local VAD (such as `silero-VAD`) as close as possible to where audio enters the application. This forms the cornerstone of a robust heuristic approach. Other examples are provided to demonstrate alternative methods and use cases, but may not achieve the same level of performance as the local VAD-based approach.
## Dependencies
The main dependencies for this project are:
- [Deepgram Python SDK](https://github.com/deepgram/deepgram-python): For interfacing with the Deepgram API
- [silero-vad](https://github.com/snakers4/silero-vad): For local Voice Activity Detection
Specific dependencies for each implementation are listed in the respective `requirements.txt` files.
## Note
This is a reference implementation intended for demonstration purposes. It showcases how to leverage Deepgram's API features alongside custom processing for advanced speech endpointing scenarios.