Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/localnerve/gigaohmbio-docker
https://github.com/localnerve/gigaohmbio-docker
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/localnerve/gigaohmbio-docker
- Owner: localnerve
- License: mit
- Created: 2024-04-25T03:13:05.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-07-08T20:31:14.000Z (6 months ago)
- Last Synced: 2024-10-15T11:15:36.037Z (2 months ago)
- Language: Shell
- Size: 349 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# gigaohmbio-docker
> A cross-platform solution to automate download and transcription of videos for gigaohmbiological
## Prerequisites
* Docker Engine or Docker Desktop
* Bourne shell## Summary
This repository contains:
* Standalone bourne shell scripts to run download and transcribe scripts
* Github Action for automating download and transcription of videos on GithubThis serves as a prototype and example of support automation for gigaohmbiological.
## Download
The download docker container gets the latest video url from twitch, downloads it, transcodes it, and writes it to an output volume.
> Build and download script exists at ./scripts/build-run-download.sh
Commands to run isolated (from project directory):
```
# build the image
export UID=`id -u`
export GID=`id -g`
docker build -t 'gigaohmbio-download' -f Dockerfile-download --build-arg UID=$UID --build-arg GID=$GID .# run the image in a tmp container
docker run --rm -v ./data:/home/pn/app/data 'gigaohmbio-download'# output is in ./data
```## Transcribe
The transcription docker container reads an input audio file, runs whisper, and writes the output to an output volume.
Uses the [go-whisper](https://github.com/appleboy/go-whisper) implementation.> Build and download script exists at ./scripts/build-run-transcribe.sh
Commands to run isolated (from project directory):
```
# build the image
docker build -t 'gigaohmbio-transcribe' -f Dockerfile-transcribe .# run the image in a tmp container (params as env vars)
# INPUT_MODEL always has to be passed by environment, unless you have the full path to downloaded model
# model keywords: small, medium, large, large-v1, large-v2
export INPUT_MODEL=small
docker run --rm -v ./data:/app/testdata -v ./models:/app/models \
'gigaohmbio-transcribe' \
--input-audio /app/testdata/my-latest-audio-file-in-data-dir.m4a \
--output-format txt \
--print-progress true# output is in ./data
```All go-whisper variables are listed at the source [repo](https://github.com/appleboy/go-whisper/blob/main/README.md)
## Further Automation Notes/Ideas
* The outputs of this project can be further redirected or sent on to other services
* The whisper output could be sent as input to an AI service to generate summary articles
* The Github Action can be run on a cron schedule, managed by Github Actions
* The download script:
* Can be changed to read multiple videos at once, enabling less frequent processing (cost purposes)
* Can be changed to transcode to multiple formats (audio, video)
* Can be changed to source from other services, not just twitch
* Hosts full chrome/puppeteer and ffmpeg, so can be scripted to perform the download/transcribe itself instead of relying on dependencies (like twitch-dl)