https://github.com/devflowinc/youtube-transcribe

Upload chunks from a Youtube Channel's videos to an Arguflow instance
https://github.com/devflowinc/youtube-transcribe

arguflow embeddings semantic-search vector-retrieval youtube youtube-api youtube-transcript youtube-transcript-api youtube-transcripts

Last synced: 1 day ago
JSON representation

Upload chunks from a Youtube Channel's videos to an Arguflow instance

Host: GitHub
URL: https://github.com/devflowinc/youtube-transcribe
Owner: devflowinc
Created: 2023-09-25T00:43:44.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-12-04T17:12:30.000Z (over 1 year ago)
Last Synced: 2025-06-22T05:17:06.935Z (4 days ago)
Topics: arguflow, embeddings, semantic-search, vector-retrieval, youtube, youtube-api, youtube-transcript, youtube-transcript-api, youtube-transcripts
Language: Python
Homepage: https://arguflow.ai
Size: 27.3 KB
Stars: 5
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Upload chunks from a Youtube Channel's videos to Arguflow

## Install

Install the Python packages

```sh
pip install -r ./requirements.txt
```

## Deploy an Arguflow instance and REDIS

[Follow the self-hosting guide here](https://docs.arguflow.ai/self_hosting)

The `script-redis` service in the Arguflow docker-compose is intended to be used with this. If you go that route your `REDIS_URL` env value will be something like `REDIS_URL=redis://:thisredispasswordisverysecureandcomplex@:6380`. You can also use managed REDIS with something like [Render](https://render.com).

## Get the CHANNEL_ID for the youtube channel you want to get transcripts from

Find the URL of the channel you want to deploy Arguflow on top of then get the CHANNEL_ID with a tool like [Comment Picker](https://commentpicker.com/youtube-channel-id.php)

## Set your ENV's

They should look something like:

```
CHANNEL_ID=UC0vBXGSyV14uvJ4hECDOl0Q
YOUTUBE_API_KEY=***************************************
REDIS_PASSWORD=thisredispasswordisverysecureandcomplex
REDIS_URL=redis://:thisredispasswordisverysecureandcomplex@localhost:6380
ARGUFLOW_API_URL=http://localhost:8090/api
ARGUFLOW_API_KEY=af-********************************
```

## Add all of the video id's to a REDIS queue

`python ./upload.py`

## Second option to only add single video

`python ./upload.py `

## Get the raw transcripts of the videos, punctuate them, then upload to your Arguflow instance

You should typically run at least 6 of `main.py` process in parallel.

`python ./main.py`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/devflowinc/youtube-transcribe

Awesome Lists containing this project

README