https://github.com/noman-land/transcript.fish
Unofficial No Such Thing As A Fish episode transcripts.
https://github.com/noman-land/transcript.fish
ai cloudflare facts github-pages http-range httpvfs javascript llm nosuchthingasafish npm nstaaf python react rss speech-to-text sqlite sqlite3 sqljs typescript whisper-cpp
Last synced: about 2 months ago
JSON representation
Unofficial No Such Thing As A Fish episode transcripts.
- Host: GitHub
- URL: https://github.com/noman-land/transcript.fish
- Owner: noman-land
- Created: 2023-06-24T06:43:57.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-05-06T02:41:36.000Z (about 1 year ago)
- Last Synced: 2024-05-07T02:42:54.517Z (about 1 year ago)
- Topics: ai, cloudflare, facts, github-pages, http-range, httpvfs, javascript, llm, nosuchthingasafish, npm, nstaaf, python, react, rss, speech-to-text, sqlite, sqlite3, sqljs, typescript, whisper-cpp
- Language: TypeScript
- Homepage: https://transcript.fish/
- Size: 175 MB
- Stars: 16
- Watchers: 2
- Forks: 2
- Open Issues: 30
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# transcript.fish
Unofficial No Such Thing As A Fish episode transcripts.
## Running web app locally
### Install js dependencies
Run `npm install`
### Start dev server
Run `npm start`
### Load locally running app
Open [`http://localhost:5173/?deployed=true`](http://localhost:5173/?deployed=true) to load all assets from remote hosts. (Try this first)
OR
Open [`http://localhost:5173/`](http://localhost:5173/) to use local assets.
## To download episodes from the RSS feed, transcribe them, and add them to the database
**TODO:** Add instructions for creating database with [migrations.sql](./db/migrations.sql)
### Create virtual env
Run `python -m venv venv`
### Activate virtual env
Run `source venv`
### Install python dependencies
Run `pip install -r requirements.txt`
### Download and transcribe a specific episode by number
Run `npm run convert 146`
### Download and transcribe all untranscribed episodes
Run `npm run convert`
**Warning**: This will take a long time
**NOTE** The first time this script is run, it needs to download the Whisper model, which requires [`local_files_only`](./src/python/whisper.py#L13) to be temporarily set to `False`. After this, the option can be changed back to `True`.
### Changing whisper model size for faster but less accurate performance
In `whisper.py` change [`model_size`](./src/python/whisper.py#L7) to your preferred model. See [available models.](https://huggingface.co/guillaumekln)
**NOTE**: By default this uses the `large-v2` Whisper model. On an M1 Mac with 64GB of RAM this transcribes at about `1x` speed. This means an hour long episode gets transcribed in about an hour.
So, as of 8 February 2025:
```sql
select sum(duration) from episodes
-- 1555237
``````text
1,555,237.0 seconds
÷ 60.0 seconds
÷ 60.0 minutes
÷ 24.0 hours
-----------------------
= 18.0 days
```The good news is changing to the `medium.en`, `small.en`, or `tiny.en` model increases this speed dramatically but the accuracy goes down. `small.en` transcribes at about `3x` speed, for example.
The other good news is that the convert script is idempotent in that you can kill the script (`Ctrl + C`) and restart it at any time and it will pick back up after the last fully transcribed episode. You can safely run this script over and over without creating any duplicates.
**NOTE:** This script also [downloads](/src/python/convert.py#L25) all the audio files for the episodes as well as each episode's album art. As of 8 February 2025 this amounts to 568 episodes, ~24.2GB audio, ~190MB images.
### Split database into chunks
Run `npm run split`
## Deploying
### Sync database, audio, images, and fonts to (Cloudflare) R2
Needs [`rclone`](https://rclone.org/) and [`jq`](https://jqlang.github.io/jq/) installed.
Run `npm run sync`