{"id":19267912,"url":"https://github.com/rameshaditya/scoper","last_synced_at":"2025-04-10T01:10:14.576Z","repository":{"id":50186535,"uuid":"205099383","full_name":"RameshAditya/scoper","owner":"RameshAditya","description":"Fuzzy and semantic search for captioned YouTube videos.","archived":false,"fork":false,"pushed_at":"2022-12-08T06:05:45.000Z","size":11301,"stargazers_count":266,"open_issues_count":5,"forks_count":15,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-02T22:07:58.388Z","etag":null,"topics":["fuzzy-search","machine-learning","ml","nlp","search","search-algorithm","semantic","youtube","youtube-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RameshAditya.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-29T06:49:10.000Z","updated_at":"2025-01-20T07:26:17.000Z","dependencies_parsed_at":"2023-01-24T15:00:39.122Z","dependency_job_id":null,"html_url":"https://github.com/RameshAditya/scoper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RameshAditya%2Fscoper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RameshAditya%2Fscoper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RameshAditya%2Fscoper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RameshAditya%2Fscoper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RameshAditya","download_url":"https://codeload.github.com/RameshAditya/scoper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137886,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fuzzy-search","machine-learning","ml","nlp","search","search-algorithm","semantic","youtube","youtube-api"],"created_at":"2024-11-09T20:14:28.984Z","updated_at":"2025-04-10T01:10:14.553Z","avatar_url":"https://github.com/RameshAditya.png","language":"Python","readme":"![https://raw.githubusercontent.com/RameshAditya/scoper/master/github-resources/logo.jpg?token=AECQW6LFNMRDVEGLBMMVKFK5OH7AI](https://raw.githubusercontent.com/RameshAditya/scoper/master/github-resources/logo.jpg?token=AECQW6LFNMRDVEGLBMMVKFK5OH7AI)\n--------------------------------------\n\n## Fuzzy and Semantic Caption-Based Searching for YouTube Videos \n\n![https://github.com/RameshAditya/scoper/blob/master/github-resources/demo_fuzzy.gif](https://github.com/RameshAditya/scoper/blob/master/github-resources/demo_fuzzy.gif)\n\n---------------------------------------------------------------------------------------------------------\n## Contents\n- [What Scoper is](#what-scoper-is)\n- [How Scoper works](#how-scoper-works)\n- [How to use Scoper](#how-to-use-scoper)\n- [Future plans](#future-plans)\n- [Support me](#support-me)\n\n---------------------------------------------------------------------------------------------------------\n## What Scoper is\nScoper is a python script that takes a youtube URL and a user query string as inputs, and returns the timestamps in the video where the content of the caption closely matches the user's query string.\n\nFor example, in the video - [https://www.youtube.com/watch?v=bfHEnw6Rm-4](https://www.youtube.com/watch?v=bfHEnw6Rm-4) - which is Apple's October 2018 event, if you were to query `Photoshop for ipad`, you'd see the following output -\n\n```\nphotoshop on ipad.                   1h 6m 29s\nfor.                                 54m 16s\nipad.                                50m 37s\nphotoshop.                           1h 14m 8s\nthis is a historic center for        3m 48s\nwould love to play it for you        4m 50s\npro users but designed for all       7m 52s\nexactly what you're looking for,     8m 0s\ngo and use for everything they       8m 52s\nproduct line for years to come,      9m 29s\n\n```\n\n---------------------------------------------------------------------------------------------------------\n## How Scoper works\nScoper works in two ways. \n\n- Extract captions and timestamps from the YouTube URL\n- Preprocess the user query and train a Word2Vec model\n- Query over the captions and find the best match. This is done in two ways, as decided by the user -\n  - Fuzzy searching\n    - Scoper enables you to query over the video's captions by using fuzzy matching algorithms.\n    - This means it searches for the most relevant captions in terms of spelling and finds the nearest match.\n    - Done by using variants of Levenshtein's distance algorithms.\n    - Supports multiple languages.\n  \n  - Semantic searching\n    - Scoper also enables you to query over the video's captions using semantic sentence similarity algorithms.\n    - The performance of semantic searching is highly dependent on the dataset on which the Word2Vec model used is trained on.\n    - By default, the Brown's corpus is used to train the Word2Vec model, and additionally a modified word-mover's distance algorithm is used to evaluate sentence-sentence similarity.\n    - For non-english language querying, the user will have to provide their own dataset.\n- Map back the chosen captions to the original timestamps and return them\n\n---------------------------------------------------------------------------------------------------------\n## How to use Scoper\n\nShell usage\n```python\n\u003e\u003e\u003e obj = Scoper()\n\u003e\u003e\u003e obj.main('https://www.youtube.com/watch?v=wFTmQ27S7OQ', mode = 'FUZZY', limit = 10)\nEnter query string: Apple Watch\n\n[('Apple Watch.', 1796.994), ('the iPad to the Apple watch, and', 318.617), ('Apple Watch has grown in such a', 480.379), ... ]\n\n```\n\nWeb GUI usage\n```\npython app.py\n\n```\n\nCLI usage\n```\n\u003e python -W ignore scoper.py --video https://www.youtube.com/watch?v=bfHEnw6Rm-4 --mode FUZZY --limit 10 --language en\nEnter query string: prjct airo\n\nair.                                 9m 0s\nproject aero, our new augmented      1h 6m 7s\nwell, with project aero, now you     1h 9m 54s\nwe also showed you project aero,     1h 11m 28s\npro.                                 49m 43s\nipad pro and it protects both        57m 15s\ntap.                                 59m 52s\nso now with photoshop, project       1h 10m 41s\nproducts, every ipad pro is made     1h 15m 41s\nprevious air.                        18m 13s\n\n\n\u003e python -W ignore scoper.py --video https://www.youtube.com/watch?v=bfHEnw6Rm-4 --mode SEMANTIC --limit 10 --language en\nEnter query string: i can't wait to introduce you\n\ni am thrilled to be able to tell     46m 43s\nyou're going to be amazed by         1h 19m 18s\npowered by the all-new a12x          51m 26s\nbut since this is an x chip, it      51m 51s\nin fact, this new a12x has more      51m 55s\ni can't wait for you to get your     25m 32s\njust like in the x-r, we call it     47m 15s\nthe a12x bionic has an all-new       53m 1s\na few days ago and they're live      40m 31s\nand all of the new features of       21m 47s\n\n```\n\n---------------------------------------------------------------------------------------------------------\n## Future Plans\n- Improve the sentence similarity algorithm\n- Include out-of-the-box support for use of pretrained word embeddings\n- Include support for general audio searching using SpeechRecognition APIs to generate a corpus from non-captioned audios\n\n---------------------------------------------------------------------------------------------------------\n## Support Me\nIf you liked this, leave a star! :star:\n\nIf you liked this and also liked my other work, be sure to follow me for more! :slightly_smiling_face:\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frameshaditya%2Fscoper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frameshaditya%2Fscoper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frameshaditya%2Fscoper/lists"}