{"id":13456468,"url":"https://github.com/NotJoeMartinez/yt-fts","last_synced_at":"2025-03-24T10:32:38.854Z","repository":{"id":167428390,"uuid":"643050363","full_name":"NotJoeMartinez/yt-fts","owner":"NotJoeMartinez","description":"YouTube Full Text Search - Search all of a YouTube channel from the command line","archived":false,"fork":false,"pushed_at":"2024-09-13T01:58:39.000Z","size":376,"stargazers_count":1677,"open_issues_count":13,"forks_count":85,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-03-20T06:18:56.181Z","etag":null,"topics":["chromadb","cli","click","full-text-search","llm","rag","semantic-search","sqlite","youtube","yt-dlp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NotJoeMartinez.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":"FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"notjoemartinez"}},"created_at":"2023-05-20T00:58:02.000Z","updated_at":"2025-03-19T00:07:37.000Z","dependencies_parsed_at":"2024-04-08T04:25:48.418Z","dependency_job_id":"0ab8d008-8b23-4118-98ce-d751c5c7b90a","html_url":"https://github.com/NotJoeMartinez/yt-fts","commit_stats":null,"previous_names":["notjoemartinez/yt-fts"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotJoeMartinez%2Fyt-fts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotJoeMartinez%2Fyt-fts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotJoeMartinez%2Fyt-fts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotJoeMartinez%2Fyt-fts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NotJoeMartinez","download_url":"https://codeload.github.com/NotJoeMartinez/yt-fts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245252491,"owners_count":20585074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","cli","click","full-text-search","llm","rag","semantic-search","sqlite","youtube","yt-dlp"],"created_at":"2024-07-31T08:01:22.578Z","updated_at":"2025-03-24T10:32:38.571Z","avatar_url":"https://github.com/NotJoeMartinez.png","language":"Python","funding_links":["https://github.com/sponsors/notjoemartinez"],"categories":["Python","Install from Source"],"sub_categories":["Media Server"],"readme":"# yt-fts - YouTube Full Text Search \n`yt-fts` is a command line program that uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to scrape all of a YouTube \nchannels subtitles and load them into a sqlite database that is searchable from the command line. It allows you to\nquery a channel for specific key word or phrase and will generate time stamped YouTube urls to\nthe video containing the keyword. \n\nIt also supports semantic search via the [OpenAI embeddings API](https://beta.openai.com/docs/api-reference/) using [chromadb](https://github.com/chroma-core/chroma).\n\n- [Blog Post](https://notjoemartinez.com/blog/youtube_full_text_search/)\n- [LLM/RAG Chat Bot](#llm-chat-bot)\n- [Video Summaries](#summarize)\n- [Semantic Search](#vsearch-semantic-search)\n- [CHANGELOG](CHANGELOG.md)\n\nhttps://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14\n\n## Installation \n\npip \n\n```bash\npip install yt-fts\n```\n\n## `download`\nDownload subtitles for a channel. \n\nTakes a channel url as an argument. Specify the number of jobs to parallelize the download with the `--jobs` flag. \nUse the `--cookies-from-browser` to use cookies from your browser in the requests, will help if you're getting errors \nthat request you to sign in. You can also run the `update` command several times to gradually get more videos into the database. \n\n```bash\nyt-fts download --jobs 5 \"https://www.youtube.com/@3blue1brown\"\nyt-fts download --cookies-from-browser firefox \"https://www.youtube.com/@3blue1brown\"\n```\n\n## `list`\nList saved channels.\n\nThe (ss) next to the channel name indicates that the channel has semantic search enabled. \n\n```bash\nyt-fts list\n```\n\n```\n┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ ID ┃ Name                  ┃ Count ┃ Channel ID               ┃\n┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ 1  │ ChessPage1 (ss)       │ 19    │ UCO2QPmnJFjdvJ6ch-pe27dQ │\n│ 2  │ 3Blue1Brown           │ 127   │ UCYO_jab_esuFRV4b17AJtAw │\n│ 3  │ george hotz archive   │ 410   │ UCwgKmJM4ZJQRJ-U5NjvR2dg │\n│ 4  │ The Tim Dillon Show   │ 288   │ UC4woSp8ITBoYDmjkukhEhxg │\n│ 5  │ Academy of Ideas (ss) │ 190   │ UCiRiQGCHGjDLT9FQXFW0I3A │\n└────┴───────────────────────┴───────┴──────────────────────────┘\n\n```\n\n## `search` (Full Text Search)\nFull text search for a string in saved channels.\n\n- The search string does not have to be a word for word and match \n- Search strings are limited to 40 characters. \n\n```bash\n# search in all channels\nyt-fts search \"[search query]\" \n\n# search in channel \nyt-fts search \"[search query]\" --channel \"[channel name or id]\" \n\n# search in specific video\nyt-fts search \"[search query]\" --video-id \"[video id]\"\n\n# limit results \nyt-fts search \"[search query]\" --limit \"[number of results]\" --channel \"[channel name or id]\"\n\n# export results to csv\nyt-fts search \"[search query]\" --export --channel \"[channel name or id]\" \n```\n\nAdvanced Search Syntax:\n\nThe search string supports sqlite [Enhanced Query Syntax](https://www.sqlite.org/fts3.html#full_text_index_queries).\nwhich includes things like [prefix queries](https://www.sqlite.org/fts3.html#termprefix) which you can use to match parts of a word.  \n\n```bash\n# AND search\nyt-fts search \"knife AND Malibu\" --channel \"The Tim Dillon Show\" \n\n# OR SEARCH \nyt-fts search \"knife OR Malibu\" --channel \"The Tim Dillon Show\" \n\n# wild cards\nyt-fts search \"rea* kni* Mali*\" --channel \"The Tim Dillon Show\" \n```\n\n\n# Semantic Search and RAG\nYou can enable semantic search for a channel by using the `mbeddings` command.\nThis requires an OpenAI API key set in the environment variable `OPENAI_API_KEY`, or \nyou can pass the key with the `--openai-api-key` flag. \n\n\n## `embeddings`\nFetches OpenAI embeddings for specified channel\n```bash\n\n# make sure openAI key is set\n# export OPENAI_API_KEY=\"[yourOpenAIKey]\"\n\nyt-fts embeddings --channel \"3Blue1Brown\"\n\n# specify time interval in seconds to split text by default is 30 \n# the larger the interval the more accurate the llm response  \n# but semantic search will have more text for you to read. \nyt-fts embeddings --interval 60 --channel \"3Blue1Brown\" \n```\nAfter the embeddings are saved you will see a `(ss)` next to the channel name when you \nlist channels, and you will be able to use the `vsearch` command for that channel. \n\n## `llm` (Chat Bot)\nStarts interactive chat session with `gpt-4o` OpenAI model using \nthe semantic search results of your initial prompt as the context\nto answer questions. If it can't answer your question, it has a \nmechanism to update the context by running targeted query based \noff the conversation. The channel must have semantic search enabled.\n\n```bash\nyt-fts llm --channel \"3Blue1Brown\" \"How does back propagation work?\"\n```\n\n## `summarize`\nSummarizes a YouTube video transcript, providing time stamped URLS. \nRequires a valid YouTube video URL or video ID as argument. If the \ntrancript is not in the database it will try to scrape it.\n\n```bash\nyt-fts summarize \"https://www.youtube.com/watch?v=9-Jl0dxWQs8\"\n# or\nyt-fts summarize \"9-Jl0dxWQs8\"\n```\noutput:\n```\nIn this video, 3Blue1Brown explores how large language models (LLMs) like GPT-3 \nmight store facts within their vast...                                                         \n\n 1 Introduction to Fact Storage in LLMs:                                                                                     \n    • The video starts by questioning how LLMs store specific facts and                                                      \n      introduces the idea that these facts might be stored in a particular part of the                                       \n      network known as multi-layer perceptrons (MLPs).                                                                       \n    • 0:00                                                                                                                   \n 2 Overview of Transformers and MLPs:                                                                                        \n    • Provides a refresher on transformers and explains that the video will focus                                            \n```\n\n## `vsearch` (Semantic Search)\n`vsearch` is for \"Vector search\". This requires that you enable semantic \nsearch for a channel with `embeddings`. It has the same options as \n`search` but output will be sorted by similarity to the search string and \nthe default return limit is 10. \n\n```bash\n# search by channel name\nyt-fts vsearch \"[search query]\" --channel \"[channel name or id]\"\n\n# search in specific video\nyt-fts vsearch \"[search query]\" --video-id \"[video id]\"\n\n# limit results \nyt-fts vsearch \"[search query]\" --limit \"[number of results]\" --channel \"[channel name or id]\"\n\n# export results to csv\nyt-fts vsearch \"[search query]\" --export --channel \"[channel name or id]\" \n\n```\n\n## How To\n\n**Export search results:**\n\nFor both the `search` and `vsearch` commands you can export the results to a csv file with \nthe `--export` flag. and it will save the results to a csv file in the current directory. \n```bash\nyt-fts search \"life in the big city\" --export\nyt-fts vsearch \"existing in large metropolaten center\" --export\n```\n\n**Delete a channel:**\nYou can delete a channel with the `delete` command. \n\n```bash\nyt-fts delete --channel \"3Blue1Brown\"\n```\n\n\n**Update a channel:**\nThe update command currently only works for full text search and will not update the \nsemantic search embeddings. \n\n```bash\nyt-fts update --channel \"3Blue1Brown\"\n```\n\n\n**Export all of a channel's transcript:**\n\nThis command will create a directory in current working directory with the YouTube \nchannel id of the specified channel.\n```bash\n# Export to vtt\nyt-fts export --channel \"[id/name]\" --format \"[vtt/txt]\"\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNotJoeMartinez%2Fyt-fts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNotJoeMartinez%2Fyt-fts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNotJoeMartinez%2Fyt-fts/lists"}