Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/hasdata/find-urls-from-any-domain
This repository provides practical examples of website link scraping using Python and Node.js.
ai-extraction crawler hasdata-api nodejs python sitemap-parser url-extraction web-crawling web-scraping
Last synced: 06 May 2026
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 07 May 2026
https://github.com/theshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 24 Apr 2026
https://github.com/thc1006/nycu_timtable_crawler
🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.
academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping
Last synced: 24 Apr 2026
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 07 May 2026
https://github.com/theognis1002/nimbus-crawler
Highly concurrent web crawler written in Go
crawler docker golang message-queue postgresql redis
Last synced: 23 Jun 2026
https://github.com/zhqiang1989/youtube-graph-collector
A demo in python on how to collect youtube video engagement graph data
Last synced: 07 May 2026
https://github.com/ireddragonicy/booruprompt
A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.
booru cleaning-data crawler nextjs noobai tags typescript web
Last synced: 07 May 2026
https://github.com/v-bible/crawler
A collection of web crawlers to crawl Catholic resources in Vietnamese language
catholic corpus-linguistics crawler nlp playwright
Last synced: 22 Apr 2026
https://github.com/rodrigorvsn/ace
🔥 Receiving an email of hottest promotions every day
crawler cronjob nextjs prisma puppeteer react-email resend
Last synced: 17 Apr 2026
https://github.com/illm4tic/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 21 Apr 2026
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 08 May 2026
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 09 May 2026
https://github.com/allotmentandy/socialmedialinkextractor
php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links
crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube
Last synced: 09 May 2026
https://github.com/basemax/okala-product-ids
A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.
crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product
Last synced: 09 May 2026
https://github.com/catbraaain/search-crawl
Search the web and crawl content stealthily, with optional extraction using LLMs.
crawl crawler fastapi playwright scrape scraping searxng
Last synced: 09 May 2026
https://github.com/a-b-z-b/web-spider
A Humble Web Crawler
crawler docker-compose go mongodb web-crawler
Last synced: 09 May 2026
https://github.com/victorbaumgartner/electron-crawler-ui
Desktop app with axios electron to crawl websites accross multiple servers
app axios crawler desktop electronjs macos multiple-servers multithreading
Last synced: 09 May 2026
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 21 Apr 2026
https://github.com/ravenastar-js/ravpagelinks
🚀 RavPageLinks 🕷️ Ferramenta básica de Enumeração de URLs em Páginas Web
axios chalk crawler links playwright ravenastar scraping url-enumeration
Last synced: 20 Apr 2026
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 09 May 2026
https://github.com/kernelerr/pixivurls
An awesome tool to get Pixiv image URLs.
Last synced: 20 Apr 2026
https://github.com/nsalvacao/cli-plugins
OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge
ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python
Last synced: 04 Mar 2026
https://github.com/gesiscss/github_traffic_crawler
Retrieve the data information from the repositories (insight, usage, commits)
Last synced: 20 Apr 2026
https://github.com/marshallvoid/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 29 Apr 2026
https://github.com/igorbrizack/crawler-web
Aplicação de coleta de dados Web com ReactJS e Python - API Rest
beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper
Last synced: 16 Apr 2026
https://github.com/olostep-api/olostep-cli
CLI for the Olostep API — scrape, map, crawl, answer, batch the web from your terminal. Pure JS rewrite of olostep-cli.
ai-agents cli crawler mcp nodejs npm olostep scraping typescript web-scraping
Last synced: 03 Jun 2026
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 23 Jun 2026
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 11 May 2026
https://github.com/woshiluo/bilibilicomic-download
bilibili crawler downloader manga
Last synced: 11 May 2026
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 12 May 2026
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 12 May 2026
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 13 May 2026
https://github.com/manchittlab/TheCrawler
Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.
agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping
Last synced: 20 Jun 2026
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 19 Apr 2026
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 19 Apr 2026
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 14 May 2026
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 18 Apr 2026
https://github.com/triekai/review-radar
An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.
crawler firebase gemini google-maps nextjs openai pwa react
Last synced: 04 Apr 2026
https://github.com/lig8t555/ecommerce
MERN Stack Ecommerce Store | Running In Production | MVP
baidu-tieba baotu bootstrap crawler douban-music ecommerce-platform fofa mongoose quanjing redux shopping-cart shopping-cart-solution stripe taobao-spider
Last synced: 04 Apr 2026
https://github.com/scrape-do/dotnet-example
Best Rotating Proxy & Scraping API Alternative. C# Example.
captcha captcha-solver crawler crawlers crawling data-mining data-science data-scraping free free-proxy free-proxy-list proxy proxy-list proxylist rotating-proxy scraper scraping scraping-api scraping-tool
Last synced: 12 Jun 2026
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 22 Jun 2026
https://github.com/ryu1kn/procedural-page-crawler
Page Crawler. Tell it where to go and what to look for.
Last synced: 30 Apr 2026
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 08 Jun 2026
https://github.com/chunkingz/youtubelinks-scraper
A python script that scrapes Youtube links from a predefined website of choice.
crawler python scraper spider websitescraper youtube
Last synced: 29 Apr 2026
https://github.com/nabi-allenby/web-crawler
BFS web crawler
crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler
Last synced: 02 Mar 2026
https://github.com/metehan777/http-header-link-graph
Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.
aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo
Last synced: 03 Jun 2026
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 29 Apr 2026
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 01 May 2026
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 01 May 2026
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 13 Jun 2026
https://github.com/kkuvam/web-scrape
Web Scraping Technology Evaluation - Evaluation of different web scraping technologies in Python, with a focus on Requests, BeautifulSoup, and Scrapy. Benchmarked each technology for ease of use, performance, scalability, and maintainability
beautifulsoup crawler requests scraping scrapy
Last synced: 28 Apr 2026
https://github.com/justserpapi/web-html
JustSerpAPI Crawl Webpage HTML API Python SDK examples, with related Google Search API, Google Lens API, Google Maps API, Google News API, Google Shopping API, Google Scholar API, Google Finance API, Google Trends API, Google Jobs API, Google Patents API, Google Hotels API, and Web APIs.
crawler google-finance-api google-hotels-api google-jobs-api google-lens-api google-maps-api google-news-api google-patents-api google-scholar-api google-search-api google-shopping-api google-trends-api html-api justserpapi python serp-api web-crawling web-html-api web-scraping
Last synced: 08 Jun 2026
https://github.com/soenneker/soenneker.playwrights.crawler
A configurable Playwright crawler with rich stealth and control options.
browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util
Last synced: 14 Jun 2026
https://github.com/qqxs/usda_pomological_watercolors
爬取美国农业部果树水彩的数据
crawler koa2 nodejs watercolors
Last synced: 01 May 2026
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 14 Jun 2026
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 02 May 2026
https://github.com/dearvn/crawl-mortgage-broker
A script to crawl data from website https://findamortgagebroker.com/
crawler findamortgagebroker mortgage-lenders mortgage-loans nmls php7 python3 seleniumbase
Last synced: 28 Apr 2026
https://github.com/cold-bin/jwzx-mail
use golang to construct cqupt-jwzx crawler application
Last synced: 09 Jun 2026
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 15 Jun 2026
https://github.com/moonyfringers/ladon
crawler data-pipeline ladon ladon-framework llm python training-data web-crawler web-scraping
Last synced: 17 Apr 2026
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 15 Jun 2026
https://github.com/twknab/django_ajax_web_crawler
Web crawler which retrieves all links on any page. Python & Django-powered.
beautifulsoup4 crawler django-application
Last synced: 27 Apr 2026
https://github.com/alexnthnz/web-crawler
Scalable web crawler built with Python, Redis, and Cassandra, inspired by Alex Xu's design. Crawls, indexes, and stores web content with robots.txt compliance and duplicate detection.
Last synced: 03 May 2026
https://github.com/soffits/oogc-resource-index
Spreadsheet-ready OOGC resource indexing with incremental crawl, authenticated download URLs, and Seafile export.
agpl-3 automation cli crawler python uv
Last synced: 03 May 2026
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 02 May 2026
https://github.com/rebrowser/iaai-dataset
IAAI salvage auction data: vehicle listings with loss types, damage codes, title brands, mileage, drivetrain, condition grades, and branch locations. Updated daily.
automotive-data crawler data-collection data-science dataset iaai insurance-auto-auction open-data parquet salvage-auction salvage-vehicles scraper total-loss vehicle-auction web-scraping
Last synced: 03 May 2026