Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Eyevinn/auto-subtitles
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
https://github.com/Eyevinn/auto-subtitles
ffmpeg openai openai-whisper subtitle-generator subtitles subtitles-generator tools transcription video video-streaming whisper
Last synced: about 2 months ago
JSON representation
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
- Host: GitHub
- URL: https://github.com/Eyevinn/auto-subtitles
- Owner: Eyevinn
- License: apache-2.0
- Created: 2023-03-25T17:36:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-26T20:25:04.000Z (6 months ago)
- Last Synced: 2024-08-02T20:45:17.471Z (5 months ago)
- Topics: ffmpeg, openai, openai-whisper, subtitle-generator, subtitles, subtitles-generator, tools, transcription, video, video-streaming, whisper
- Language: TypeScript
- Homepage:
- Size: 616 KB
- Stars: 27
- Watchers: 5
- Forks: 5
- Open Issues: 10
-
Metadata Files:
- Readme: readme.md
- Contributing: contributing.md
- License: LICENSE
Awesome Lists containing this project
README
# Subtitle Generator and API
Automatically generate subtitles from an input audio or video file using Open AI Whisper.
[![Badge OSC](https://img.shields.io/badge/Evaluate-24243B?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjQiIGhlaWdodD0iMjQiIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTIiIGN5PSIxMiIgcj0iMTIiIGZpbGw9InVybCgjcGFpbnQwX2xpbmVhcl8yODIxXzMxNjcyKSIvPgo8Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSI3IiBzdHJva2U9ImJsYWNrIiBzdHJva2Utd2lkdGg9IjIiLz4KPGRlZnM%2BCjxsaW5lYXJHcmFkaWVudCBpZD0icGFpbnQwX2xpbmVhcl8yODIxXzMxNjcyIiB4MT0iMTIiIHkxPSIwIiB4Mj0iMTIiIHkyPSIyNCIgZ3JhZGllbnRVbml0cz0idXNlclNwYWNlT25Vc2UiPgo8c3RvcCBzdG9wLWNvbG9yPSIjQzE4M0ZGIi8%2BCjxzdG9wIG9mZnNldD0iMSIgc3RvcC1jb2xvcj0iIzREQzlGRiIvPgo8L2xpbmVhckdyYWRpZW50Pgo8L2RlZnM%2BCjwvc3ZnPgo%3D)](https://app.osaas.io/browse/eyevinn-auto-subtitles)
## Setup
### Requirements
The following environment variables can be set:
```text
OPENAI_API_KEY=
AWS_REGION= (optional can also be provided in payload)
AWS_ACCESS_KEY_ID= (optional, only needed when uploading to S3)
AWS_SECRET_ACCESS_KEY= (optional, only needed when uploading to S3)
```Using an `.env` file is supported. Just rename `.env.example` to `.env` and insert your values.
### FFmpeg
FFmpeg is required to convert the input file/url to a format that Open AI Whisper can process. You can download it from [here](https://www.ffmpeg.org/download.html).
## Installation / Usage
Starting the service is as simple as running:
```bash
npm install
npm start
```A docker image and docker-compose are also available:
```bash
docker-compose up --build -d
```The transcribe service is now up and running and available on port `8000`.
### Endpoints
Available endpoints are:
| Endpoint | Method | Description |
| ---------------- | ------ | --------------------------------------------------- |
| `/` | `GET` | Heartbeat endpoint of service |
| `/transcribe` | `POST` | Create a new transcribe job. Provide url in body |
| `/transcribe/s3` | `POST` | Create a new transcribe job and upload result to s3 |## Example requests
To start a new transcribe job send a `POST` request to the `/transcribe` endpoint with :
```jsonc
{
"url": "https://example.net/vod-audio_en=128000.aac"
"language": "en" // ISO 639-1 language code (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (optional)
"format": "vtt" // Supported formats: json, text, srt, verbose_json, or vtt (optional)
}
```The response will look like this where result is the `WEBVTT` file as a string:
```json
{
"workerId": "BFabbcCi3IYuWOj6LfsgK",
"result": "WEBVTT\n\n00:00:00.000 --> 00:00:04.180\nor into transcoding I mean, I could probably add just the keyframe in the start and just\n\n00:00:04.180 --> 00:00:06.920\nskip I-frames and the rest of that.\n\n"
}
```Formatted output:
```text
WEBVTT00:00:00.000 --> 00:00:01.940
So into transcoding, I mean, I could00:00:01.940 --> 00:00:03.700
probably add just a keyframe in the start00:00:03.700 --> 00:00:06.700
and then just skip iFrames in the rest of the scenes.
```### Contributing
See [contributing](contributing.md)
## Support
Join our [community on Slack](http://slack.streamingtech.se) where you can post any questions regarding any of our open source projects. Eyevinn's consulting business can also offer you:
- Further development of this component
- Customization and integration of this component into your platform
- Support and maintenance agreementContact [[email protected]](mailto:[email protected]) if you are interested.
## About Eyevinn Technology
[Eyevinn Technology](https://www.eyevinntechnology.se) is an independent consultant firm specialized in video and streaming. Independent in a way that we are not commercially tied to any platform or technology vendor. As our way to innovate and push the industry forward we develop proof-of-concepts and tools. The things we learn and the code we write we share with the industry in [blogs](https://dev.to/video) and by open sourcing the code we have written.
Want to know more about Eyevinn and how it is to work here. Contact us at !