Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-19 00:07:20 UTC
- JSON Representation
https://github.com/tijme/not-your-average-web-crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability
Last synced: 06 Apr 2025
https://github.com/abaykan/CrawlBox
Easy way to brute-force web directory.
admin-finder crawler python web-crawler wordlist
Last synced: 26 Mar 2025
https://github.com/Liu233w/acm-statistics
An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge
acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue
Last synced: 11 Apr 2025
https://github.com/liu233w/acm-statistics
An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge
acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue
Last synced: 04 Apr 2025
https://github.com/karthikuj/sasori
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
automation crawler crawling dast dynamic endpoint-discovery infosec puppeteer scraping security
Last synced: 15 Aug 2025
https://github.com/luohaha/jlitespider
A lite distributed Java spider framework :-)
crawler distributed distributed-systems rabbitmq spider
Last synced: 21 Jul 2025
https://github.com/aliakhtari78/spotifyscraper
Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song
album-title crawler free infromation preview-mp3 python python3 scraper spotfiy spotify-crawler spotify-downloader spotify-scraper spotify-scraping spotify-songs spotify-web-player webscraper webscraping
Last synced: 09 Apr 2025
https://github.com/bartdag/pylinkvalidator
pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.
crawler link-checker networking python
Last synced: 07 Apr 2025
https://github.com/twiny/spidy
Domain names collector - Crawl websites and collect domain names along with their availability status.
backlinks crawler domain expired-domain golang scraper seotools spider
Last synced: 17 Aug 2025
https://github.com/janreges/siteone-crawler
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
analyzer crawler crawling performance qa quality-assessment security seo seotools stress-testing swoole testing website
Last synced: 18 Mar 2026
https://github.com/moranzcw/Zhihu-Spider
一个获取知乎用户主页信息的多线程Python爬虫程序。
crawler jupyter-notebook matplotlib python requests zhihu-spider
Last synced: 28 Mar 2025
https://github.com/TGiles/auto-lighthouse
A utility package for automating lighthouse reporting
audits auto-lighthouse crawler lighthouse-reports robots simplecrawler
Last synced: 06 Apr 2025
https://github.com/tgiles/auto-lighthouse
A utility package for automating lighthouse reporting
audits auto-lighthouse crawler lighthouse-reports robots simplecrawler
Last synced: 06 Apr 2025
https://github.com/alex-on-ai/WebReaper
AI-native web scraper. Single binary with a bundled Claude Code skill. MIT-licensed alternative to Firecrawl.
ai-agents-automation claude-code crawler dotnet firecrawl-alternative llm markdown mcp parser parsing scraper scraping scraping-api scraping-web scraping-websites webcrawler webscraping
Last synced: 14 Jun 2026
https://github.com/teal33t/poopak
POOPAK - TOR Hidden Service Crawler
crawler dark-web darknet deepweb docker flask hidden-services mongo osint redis tor tor-network
Last synced: 08 May 2025
https://github.com/luckylittle/blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider
Last synced: 29 Apr 2025
https://github.com/roys/cewler
CeWLeR - Custom Word List generator Redefined. CeWL alternative in Python, based on the Scrapy framework.
bugbounty crawler reconnaissance spider
Last synced: 05 Apr 2026
https://github.com/jakepartusch/lumberjack
An automated website accessibility scanner and cli
a11y accessibility axe cli crawler lumberjack
Last synced: 10 Sep 2025
https://github.com/JakePartusch/lumberjack
An automated website accessibility scanner and cli
a11y accessibility axe cli crawler lumberjack
Last synced: 12 May 2025
https://github.com/hominee/dyer
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
crawler rust rust-programming-language spider web-crawler web-framework web-scraping
Last synced: 11 Mar 2026
https://github.com/lincanbin/sina-weibo-album-downloader
Multithreading download all HD photos / pictures from someone's Sina Weibo album.
Last synced: 10 Sep 2025
https://github.com/alash3al/scraply
Scraply a simple dom scraper to fetch information from any html based website
crawler crawling dom golang scraper scrapers scraping-websites scrapy server
Last synced: 28 Apr 2025
https://github.com/duckduckgo/tracker-radar-collector
🕸 Modular, multithreaded, puppeteer-based crawler
crawler puppeteer tracker-radar
Last synced: 20 Aug 2025
https://github.com/nasa-jpl-memex/memex-explorer
Viewers for statistics and dashboarding of Domain Search Engine data
ache anaconda apache crawler dashboard domain-discovery memex-explorer miniconda nutch tika
Last synced: 10 Mar 2026
https://github.com/diffbot/diffbot-python
Python client library for Diffbot APIs
crawler knowledge-graph natural-language-processing web-data web-data-extraction
Last synced: 12 Jun 2026
https://github.com/ethereum/node-crawler
Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.
Last synced: 13 Apr 2025
https://github.com/wx-chevalier/sentinel-crawler
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫
crawler etl koa2 monitor nodejs react wx-code
Last synced: 22 Aug 2025
https://github.com/wxyyxc1992/xe-crawler
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫
crawler etl koa2 monitor nodejs react wx-code
Last synced: 23 Mar 2025
https://github.com/greengerong/prerender-java
java framework for prerender
angular1 crawler java prerender prerendered-page seo
Last synced: 10 Apr 2025
https://github.com/duyet/pricetrack
Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.web.app
api crawler cronjob-scheduler firebase firebase-auth firebase-functions firebase-hosting firestore redash shopee shopee-api tiki tracking
Last synced: 07 Jul 2025
https://github.com/pavlovtech/WebReaper
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping
Last synced: 08 Apr 2025
https://github.com/r05323028/eyes
Public Opinion Mining System of Taiwanese Forums
crawler data-engineering data-mining data-science graphql javascript natural-language-processing opinion-mining public-opinion python react redux tailwindcss task-queue
Last synced: 21 Jan 2026
https://github.com/mazzzystar/baiducrawler
Sample of using proxies to crawl baidu search results.
Last synced: 04 Oct 2025
https://github.com/wwwwwydev/crawlist
A universal solution for web crawling lists. 抓取网页列表的通用解决方案
crawl crawler crawler-python crawling-python crawlist python reptile
Last synced: 01 May 2025
https://github.com/maxvalue/terpene-profile-parser-for-cannabis-strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
analysis aromatherapy bioinformatics biological-data biological-data-analysis cannabis cannabis-strains crawler data-science database health plants python python-3 scrapy terpene-profile terpenes web-crawler web-crawler-python web-crawling
Last synced: 22 Apr 2025
https://github.com/archiveteam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 04 Apr 2025
https://github.com/pinkpixel-dev/web-scout-mcp
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.
ai-assistant ai-tools cheerio content-extraction crawler duckduckgo duckduckgo-search google-search mcp mcp-server web-content web-crawler web-scraper web-scraping web-search web-search-agent
Last synced: 06 Mar 2026
https://github.com/iamatulsingh/pinscrape
A simple library to scrape Pinterest images.
crawler pinscrape pinterest-image-downloader pinterest-image-grabber pinterest-image-scraper pinterest-scraper python python3 scraper web-scraping
Last synced: 16 Jan 2026
https://github.com/SimFin/pdf-crawler
SimFin's open source PDF crawler
crawler crawling geckodriver pdf pdf-crawler puppeteer python selenium-webdriver
Last synced: 07 Apr 2025
https://github.com/hardikvasa/webb
Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!
crawl-pages crawler python-library
Last synced: 07 Apr 2025
https://github.com/simfin/pdf-crawler
SimFin's open source PDF crawler
crawler crawling geckodriver pdf pdf-crawler puppeteer python selenium-webdriver
Last synced: 28 Oct 2025
https://github.com/antoinevastel/bots-zoo
bot crawler crawling playwright puppeteer scraper scraping selenium user-agent useragent
Last synced: 16 Aug 2025
https://github.com/jackluson/convertible-bond-crawler
宁稳网(旧富投网)、集思录可转债数据&策略分析
Last synced: 18 Jan 2026
https://github.com/SeaQL/starfish-ql
✴️ An experimental graph database
crates-io crawler database graph hacktoberfest network rust sql visualization
Last synced: 27 Apr 2025
https://github.com/schollz/linkcrawler
Cross-platform persistent and distributed web crawler :link:
Last synced: 22 Apr 2025
https://github.com/zytedata/zyte-smartproxy-headless-proxy
A complimentary proxy to help to use SPM with headless browsers
Last synced: 28 Apr 2025
https://github.com/ducdev/aliexscrape
Get Aliexpress product details in JSON
aliexpress aliexpress-api aliexpress-crawler aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 json scraper spider
Last synced: 29 Jun 2025
https://github.com/kamiyomu/kamiyomu
A self-hosted, extensible manga reader and download tool with plug-in support.
crawler crawler-agents csharp dotnet downloader kamiyomu kavita konga manga manga-downloader manga-scraper
Last synced: 15 Apr 2026
https://github.com/aminehorseman/images-web-crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
crawler dataset dataset-creation flickr-api google-images-crawler google-images-downloader image-classification image-dataset image-processing images machine-learning
Last synced: 07 Oct 2025
https://github.com/wuchunfu/ipproxypool
Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra
crawler go ip proxy proxy-server proxypool
Last synced: 21 Aug 2025
https://github.com/patrickschur/pappet
A command-line tool to crawl websites using puppeteer.
cli crawler pdf puppeteer screenshot
Last synced: 25 Aug 2025
https://github.com/ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 18 Jul 2025
https://github.com/kostas-pa/LFITester
LFITester is a Python3 program that automates the detection and exploitation of Local File Inclusion (LFI) vulnerabilities on a server.
bugbounty crawler cybersecurity enumeration exploitation fuzzing hacking lfi lfi-detection lfi-exploitation lfi-vulnerability penetration-testing penetration-testing-tools pentest-tool pentesting python web-hacking webhacking
Last synced: 12 Jul 2025
https://github.com/hueristiq/xcrawl3r
A command-line interface (CLI) based utility to recursively crawl webpages. It is designed to systematically browse webpages' URLs and follow links to discover linked webpages' URLs.
bug-bounty bug-bounty-tools contentdiscovery crawler ethical-hacking ethical-hacking-tools go golang penetration-testing penetration-testing-tools reconnaissance red-teaming red-teaming-tools web-security
Last synced: 06 Apr 2025
https://github.com/foo-git/rewe-discounts
Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste
Last synced: 04 Sep 2025
https://github.com/kurogai/deepweb-scappering
Discover hidden deepweb pages
crawler deepweb hacking hacking-tool internet kali python3 scappering scapre tor tor-network
Last synced: 01 Sep 2025
https://github.com/medcl/gopa-abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
crawler golang lightweight spider
Last synced: 14 Jan 2026
https://github.com/creekorful/bathyscaphe
Fast, highly configurable, cloud native dark web crawler.
architecture crawler crawling elasticsearch golang hidden-services kibana tor web-crawler
Last synced: 17 Mar 2025
https://github.com/samber/the-great-gpt-firewall
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
agent anthropic blocklist censorship crawler firewall genai generative-ai gpt gpt-4 llm openai robots-txt user-agent
Last synced: 17 Aug 2025
https://github.com/jefferyhus/es6-crawler-detect
:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
bots crawler detection es6-javascript spider
Last synced: 16 May 2025
https://github.com/nietaki/crawlie
A simple Elixir library for writing decently-performing crawlers with minimum effort.
crawler elixir elixir-library genstage
Last synced: 24 Aug 2025
https://github.com/JefferyHus/es6-crawler-detect
:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
bots crawler detection es6-javascript spider
Last synced: 29 Apr 2025
https://github.com/krau/manyacg
Collect, Download, Organize and Share your Favorite Anime Artworks.
anime crawler danbooru image-viewer kawaii nhentai picture pixiv telegram telegram-bot waifu
Last synced: 17 Apr 2026
https://github.com/tobecrazy/seleniumdemo
Selenium automation test framework
container crawler docker docker-compose jenkins maven pip python selenium selenium-grid selenium-webdriver snapshot
Last synced: 21 Aug 2025
https://github.com/Randark-JMT/Bilibili_manga_download
带图形界面的哔哩哔哩漫画下载工具
bilibili crawler downloader pyside6 python python3 qt spider
Last synced: 16 Mar 2025
https://github.com/randark-jmt/bilibili_manga_download
带图形界面的哔哩哔哩漫画下载工具
bilibili crawler downloader pyside6 python python3 qt spider
Last synced: 09 Jul 2025
https://github.com/yuanxu-li/html-table-extractor
extract data from html table
beautifulsoup crawler extract-data html html-table scraping table
Last synced: 10 Apr 2025
https://github.com/boris-code/feaplat
爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本
crawler feapder feaplat spider
Last synced: 13 Apr 2025
https://github.com/ondrejsojka/instastories-backup
Backup your friends' Instagram Stories forever and get to keep them even after 24 hours.
backup crawler instagram instagram-stories python python-3-6 python3
Last synced: 14 Sep 2025
https://github.com/crawlab-team/webspot
An intelligent web service to automatically detect web content and extract information from it.
Last synced: 11 May 2025
https://github.com/fedebotu/iclr2023-openreviewdata
Crawl & Visualize ICLR 2023 Data from OpenReview
crawler dataset iclr iclr2023 openreview peer-review review scraper
Last synced: 05 Oct 2025
https://github.com/LexiestLeszek/scrapeGPT
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 07 Apr 2025
https://github.com/lexiestleszek/scrapegpt
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 11 Mar 2026
https://github.com/get-set-fetch/extension
web scraping extension
browser crawler extension indexeddb javascript npm scraper
Last synced: 05 Apr 2025
https://github.com/kcubeterm/achoz
Search through all your personal data efficiently like web search.
crawler document-search filesearch search-engine websearch
Last synced: 21 Aug 2025
https://github.com/da2vin/fetchman
fetchman is a simple crawler system/简单好用的爬虫框架
Last synced: 12 Mar 2026
https://github.com/feiskyer/scrapy-examples
Some scrapy and web.py exmaples
Last synced: 09 Oct 2025
https://github.com/crawlzone/crawlzone
Crawlzone is a fast asynchronous internet crawling framework for PHP.
automated-testing crawler crawling-framework middleware php web-scraping web-search
Last synced: 11 Jan 2026