https://github.com/bluebirdback/deepgram_caption_generator
Use Deepgram's API to transcribe audio to text & generate captions in WebVTT or SubRip format.
https://github.com/bluebirdback/deepgram_caption_generator
Last synced: over 1 year ago
JSON representation
Use Deepgram's API to transcribe audio to text & generate captions in WebVTT or SubRip format.
- Host: GitHub
- URL: https://github.com/bluebirdback/deepgram_caption_generator
- Owner: BlueBirdBack
- License: mit
- Created: 2024-03-22T07:33:34.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-22T07:45:37.000Z (about 2 years ago)
- Last Synced: 2025-01-13T14:52:33.436Z (over 1 year ago)
- Language: Python
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deepgram Caption Generator
This Python script utilizes Deepgram's advanced speech recognition API to transcribe audio files and generate captions in WebVTT (`.vtt`) or SubRip (`.srt`) formats. It supports multiple Deepgram transcription models, providing flexibility to achieve the best accuracy for various audio types and use cases. This tool is perfect for content creators, podcasters, and anyone looking to enhance their audio content with accurate and easily integrated captions.
## Features
- Transcribe audio files using Deepgram's state-of-the-art speech-to-text API.
- Generate captions in either WebVTT or SubRip format.
- Choose from a variety of Deepgram transcription models.
- Output transcription in `.vtt`, `.srt`, or plain text format.
## Prerequisites
- Python 3.10 or later
- A Deepgram account and API key. Sign up [here](https://console.deepgram.com/signup) to obtain your API key.
## Installation
1. Clone this repository or download the files to your local machine.
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file in the project root and add your Deepgram API key like so:
```
DEEPGRAM_API_KEY='your_deepgram_api_key_here'
```
## Usage
Navigate to the `src/` directory and run the script from the command line, specifying the audio file path and optional parameters for the transcription model and output format.
```bash
python transcribe.py path_to_your_audio_file -m model_name -f output_format
```
- `path_to_your_audio_file`: The path to the audio file you want to transcribe.
- `model_name` (optional): The Deepgram model to use for transcription. Defaults to `nova-2`.
- `output_format` (optional): The format of the transcription output. Can be `vtt`, `srt`, or `text`. Defaults to `vtt`.
Example:
```bash
python src/transcribe.py ./audio/sample.mp3
```
This command will transcribe the `sample.mp3` file using the `nova-2` model and save the captions in WebVTT format.
## Supported Audio Formats
Deepgram supports a wide range of audio formats. For optimal results, use audio files in MP3, WAV, or FLAC format.
## Note
Ensure your audio file is accessible and the path is correct. Network issues or incorrect API keys may lead to transcription failures.