https://github.com/devflowinc/youtube-transcribe
Upload chunks from a Youtube Channel's videos to an Arguflow instance
https://github.com/devflowinc/youtube-transcribe
arguflow embeddings semantic-search vector-retrieval youtube youtube-api youtube-transcript youtube-transcript-api youtube-transcripts
Last synced: 1 day ago
JSON representation
Upload chunks from a Youtube Channel's videos to an Arguflow instance
- Host: GitHub
- URL: https://github.com/devflowinc/youtube-transcribe
- Owner: devflowinc
- Created: 2023-09-25T00:43:44.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-04T17:12:30.000Z (over 1 year ago)
- Last Synced: 2025-06-22T05:17:06.935Z (4 days ago)
- Topics: arguflow, embeddings, semantic-search, vector-retrieval, youtube, youtube-api, youtube-transcript, youtube-transcript-api, youtube-transcripts
- Language: Python
- Homepage: https://arguflow.ai
- Size: 27.3 KB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Upload chunks from a Youtube Channel's videos to Arguflow
## Install
Install the Python packages
```sh
pip install -r ./requirements.txt
```## Deploy an Arguflow instance and REDIS
[Follow the self-hosting guide here](https://docs.arguflow.ai/self_hosting)
The `script-redis` service in the Arguflow docker-compose is intended to be used with this. If you go that route your `REDIS_URL` env value will be something like `REDIS_URL=redis://:thisredispasswordisverysecureandcomplex@:6380`. You can also use managed REDIS with something like [Render](https://render.com).
## Get the CHANNEL_ID for the youtube channel you want to get transcripts from
Find the URL of the channel you want to deploy Arguflow on top of then get the CHANNEL_ID with a tool like [Comment Picker](https://commentpicker.com/youtube-channel-id.php)
## Set your ENV's
They should look something like:
```
CHANNEL_ID=UC0vBXGSyV14uvJ4hECDOl0Q
YOUTUBE_API_KEY=***************************************
REDIS_PASSWORD=thisredispasswordisverysecureandcomplex
REDIS_URL=redis://:thisredispasswordisverysecureandcomplex@localhost:6380
ARGUFLOW_API_URL=http://localhost:8090/api
ARGUFLOW_API_KEY=af-********************************
```## Add all of the video id's to a REDIS queue
`python ./upload.py`
## Second option to only add single video
`python ./upload.py `
## Get the raw transcripts of the videos, punctuate them, then upload to your Arguflow instance
You should typically run at least 6 of `main.py` process in parallel.
`python ./main.py`