Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/outerbounds/whisper-metaflow-k8s
https://github.com/outerbounds/whisper-metaflow-k8s
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/outerbounds/whisper-metaflow-k8s
- Owner: outerbounds
- Created: 2022-12-16T06:01:45.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-30T15:51:12.000Z (about 1 year ago)
- Last Synced: 2023-10-30T16:45:00.835Z (about 1 year ago)
- Language: Python
- Size: 247 KB
- Stars: 2
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OpenAI Whisper on Metaflow 👋
This repository will help you optimize Open AI's [Whisper](https://github.com/openai/whisper) in workflows run on the [Outerbounds Platform](https://outerbounds.com/blog/announcing-outerbounds-platform/). It builds on our earlier repository to help you [Get started with Whisper and Metaflow](https://github.com/outerbounds/whisper-metaflow). This implementation focuses on using Kubernetes resources to unlock new levels of scale and processing throughput.
## Repository Overview
| File | Description |
|------------------|-------------|
| [Dockerfile](./Dockerfile) | Dockerfile to create a docker image for running OpenAI Whisper |
| [Makefile](./Makefile) | Makefile for building the docker image |
| [youtube\_video\_transcriber.py](./youtube_video_transcriber.py) | CLI tool for creating a transcript of a given YouTube URL and given model |
| [whisper_flow.py](./whisper_flow.py) | Metaflow flow for creating transcripts of using whispers tiny and large models |# Run the flow ▶️
## Running with Kubernetes resources
To unleash the power of the cloud with [Metaflow's Kubernetes decorator](https://docs.metaflow.org/scaling/remote-tasks/kubernetes), run this command from your terminal.This uses an already built Docker image ready for running this flow.
```
$ python3 whisper_flow.py run --with kubernetes:image=public.ecr.aws/outerbounds/whisper-metaflow:latest
```# Customizing flow dependencies ⚙️
This section assumes you have Docker setup and running locally. If you don't have Docker installed, please follow the instructions [here](https://docs.docker.com/get-docker/). If there are other packages to be installed or changes to be made in existing ones, update the Dockerfile.
## Create the docker image
With Docker running, build the image specified in the `./Dockerfile`.```
$ make build
...
=> => writing image sha256:23be1b523a3404d8bee8e4c8ac29f7160ac7ad7090d48c567010a34cb9f2666e 0.0s
=> => naming to docker.io/library/whisper-metaflow 0.0s
```## Tag and push the docker image to a repository.
Then tag the resultant image and push it to an image registry. In this example, we are using GitLab's container registry.
```
$ docker tag sha256:23be1b523a3404d8bee8e4c8ac29f7160ac7ad7090d48c567010a34cb9f2666e whisper-metaflow:latest
...$ docker push whisper-metaflow
```## Run the flow with customized image
```
$ python3 whisper_flow.py run --with kubernetes:image=whisper-metaflow:latest
```## Run the flow with customized image and changed CPU/Memory resources
```
$ python3 whisper_flow.py run --with kubernetes:image=whisper-metaflow:latest,cpu=4,memory=8192
```## Alternate approach
Instead of running the flow with cli options above, you could also change the whisper_flow.py file and add the `@kubernetes` decorator to appropriate steps and then simply run the flow as:```
$ python3 whisper_flow.py run
```
# Get Help 🤗
Please join us on [Slack](http://slack.outerbounds.co/) if you have questions about getting setup. The Metaflow community is responsive and happy to help!