{"id":24110199,"url":"https://github.com/systemfsoftware/youtube-autocomplete-scraper","last_synced_at":"2025-06-25T12:32:13.039Z","repository":{"id":268507520,"uuid":"903336683","full_name":"systemfsoftware/youtube-autocomplete-scraper","owner":"systemfsoftware","description":"YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.","archived":false,"fork":false,"pushed_at":"2025-05-06T00:58:24.000Z","size":400,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-05-06T01:37:50.861Z","etag":null,"topics":["actor","apify","autocomplete","crawler","deduplication","pglite","scraper","search","similarity","suggestions","trigram","youtube","youtube-api"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/systemfsoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-14T10:46:12.000Z","updated_at":"2025-05-06T00:56:45.000Z","dependencies_parsed_at":"2025-01-14T01:34:15.340Z","dependency_job_id":"f2e0081b-0f6a-4114-87dd-09ed8c85c110","html_url":"https://github.com/systemfsoftware/youtube-autocomplete-scraper","commit_stats":null,"previous_names":["systemfsoftware/youtube-autocomplete-scraper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemfsoftware%2Fyoutube-autocomplete-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemfsoftware%2Fyoutube-autocomplete-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemfsoftware%2Fyoutube-autocomplete-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemfsoftware%2Fyoutube-autocomplete-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/systemfsoftware","download_url":"https://codeload.github.com/systemfsoftware/youtube-autocomplete-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253850531,"owners_count":21973662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor","apify","autocomplete","crawler","deduplication","pglite","scraper","search","similarity","suggestions","trigram","youtube","youtube-api"],"created_at":"2025-01-11T01:12:56.805Z","updated_at":"2025-05-13T00:35:15.964Z","avatar_url":"https://github.com/systemfsoftware.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Youtube AutoComplete Scraper\n\nA TypeScript library for scraping YouTube's autocomplete suggestions with intelligent deduplication.\n\n## Features\n\n- Scrapes YouTube's autocomplete API to get search suggestions\n- Uses pglite for efficient similarity filtering\n- Removes near-duplicate suggestions using trigram similarity\n- Configurable similarity threshold\n- TypeScript support\n- Ready to deploy on Apify platform\n\n## Installation\n\n```bash\ngit clone https://github.com/yourusername/youtube-autocomplete-scraper.git\ncd youtube-autocomplete-scraper\npnpm install\n```\n\n## Usage\n\nThere are two ways to use this scraper:\n\n### 1. Local Development\n\nRun the scraper locally by setting the required environment variables and using `pnpm start`:\n\n```bash\n# Set your input\nexport INPUT='{\"query\": \"how to make\"}'\n\n# Run the scraper\npnpm start\n```\n\nThe scraper will output results to the console and save them in the `apify_storage` directory.\n\n### 2. Deploy to Apify\n\nThis scraper is designed to run on the Apify platform. To deploy:\n\n1. Push this code to your Apify actor\n2. Set the input JSON in Apify console:\n\n```json\n{\n  \"query\": \"how to make\",\n  \"similarityThreshold\": 0.7,\n  \"maxResults\": 100,\n  \"language\": \"en\",\n  \"region\": \"US\"\n}\n```\n\n## How it Works\n\nUnder the hood, this scraper does a few key things:\n\n1. **API Querying**: Makes requests to YouTube's internal autocomplete API endpoint to get raw suggestions\n\n2. **Deduplication**: Uses pglite (a lightweight Postgres implementation) to filter out near-duplicate results:\n\n   - Converts suggestions to trigrams (3-letter sequences)\n   - Calculates similarity scores between suggestions using trigram matching\n   - Filters out suggestions that are too similar based on a configurable threshold\n   - For example, \"how to cook pasta\" and \"how to cook noodles\" might be considered unique, while \"how to make pancake\" and \"how to make pancakes\" would be filtered as duplicates\n\n3. **Result Processing**: Cleans and normalizes the suggestions before returning them\n\n## Input Schema\n\nThe scraper accepts the following input parameters:\n\n```typescript\ninterface Input {\n  query: string // The search query to get suggestions for\n  similarityThreshold?: number // How similar suggestions need to be to be considered duplicates (0-1)\n  maxResults?: number // Maximum number of suggestions to return\n  language?: string // Language code for suggestions\n  region?: string // Region code for suggestions\n}\n```\n\n## Output\n\nThe scraper outputs an array of unique autocomplete suggestions. Results are saved to the default dataset in Apify storage and can be accessed via the Apify API or console.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsystemfsoftware%2Fyoutube-autocomplete-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsystemfsoftware%2Fyoutube-autocomplete-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsystemfsoftware%2Fyoutube-autocomplete-scraper/lists"}