Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-15 00:06:49 UTC
- JSON Representation
https://github.com/rebrowser/rebrowser-patches
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver
Last synced: 14 May 2025
https://github.com/fffonion/xehentai
Doujinshi downloader 绅士漫画下载
crawler json-rpc python xehentai
Last synced: 16 May 2025
https://github.com/xuxueli/xxl-crawler
A lightweight web crawler framework.(Java爬虫框架)
crawler distributed flexible java object-oriented spider web xxl-crawler
Last synced: 15 May 2025
https://github.com/polyrabbit/hacker-news-digest
:newspaper: Let ChatGPT Summarize Hacker News for You
chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider
Last synced: 15 May 2025
https://github.com/lixi5338619/lxbook
《爬虫逆向进阶实战》书籍代码库
android-resever crawler frida java javascript python smali spiders unidbg xposed
Last synced: 13 Apr 2025
https://github.com/jsrei/js-cookie-monitor-debugger-hook
js cookie逆向利器:js cookie变动监控可视化工具 & js cookie hook打条件断点
crawler js-reverse red-team reverse-engineering userscript web-security-research
Last synced: 15 May 2025
https://github.com/python3webspider/douyin
API of DouYin for Humans used to Crawl Popular Videos and Musics
Last synced: 04 Apr 2025
https://github.com/StanGirard/seo-audits-toolkit
SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...
analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer
Last synced: 26 Mar 2025
https://github.com/Kharacternyk/dotcommon
What do people have in their dotfiles?
Last synced: 29 Mar 2025
https://github.com/stangirard/seo-audits-toolkit
SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...
analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer
Last synced: 04 Apr 2025
https://github.com/fengzhizi715/NetDiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3
Last synced: 03 May 2025
https://github.com/fengzhizi715/netdiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3
Last synced: 04 Apr 2025
https://github.com/3nock/spidersuite
Advance web security spider/crawler
bugbounty cplusplus crawler gui information-gathering osint-tool pentest qt5 recon security-tools spider web-spider webcrawler
Last synced: 29 Oct 2025
https://github.com/rndinfosecguy/Scavenger
Crawler (Bot) searching for credential leaks on paste sites.
bot crawler credentials leaks osint paste pastebin python
Last synced: 20 Mar 2025
https://github.com/josephlimtech/linkedin-profile-scraper-api
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
crawler crawling expressjs json linkedin linkedin-bot linkedin-crawler linkedin-profile linkedin-profile-scraper linkedin-scraper linkedin-scraping nodejs profile-data puppeteer scraper scrapers scraping scraping-websites spider website-scraper
Last synced: 04 Apr 2025
https://github.com/linkedtales/scrapedin
LinkedIn Scraper (currently working 2020)
crawler linkedin linkedin-scraper scraper
Last synced: 14 May 2025
https://github.com/speed/newcrawler
Free Web Scraping Tool with Java
crawler docker scraping spider
Last synced: 02 Apr 2025
https://github.com/yhy0/Jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers. 挖洞辅助工具(漏洞扫描、信息收集)
apollo-exp bugcrowd crawler hackerone jie scan scanner security-copilot shiro-exp src vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 07 Sep 2025
https://github.com/ChenZixinn/spider_reverse
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链
crawler python requests spider
Last synced: 28 Mar 2025
https://github.com/setvisible/ArrowDL
ArrowDL (Arrow Downloader) is a download manager for Windows, MacOS and Linux
batch-download crawler download download-manager libtorrent magnet-link mass-downloader mozilla-firefox nativeclient picture-download qt stream-downloader streaming torrent-client torrent-downloader video-downloader web-engine webextensions youtube-dl youtube-downloader
Last synced: 14 Mar 2025
https://github.com/TumblThreeApp/TumblThree
A Tumblr and Twitter Blog Backup Application
backup blog-backup c-sharp crawler csharp dotnet downloader mvvm tumblr tumblr-backup tumblr-backup-application tumblr-blog tumblr-like tumblr-search twitter twitter-backup twitter-backup-application twitter-blog windows wpf
Last synced: 22 Mar 2025
https://github.com/rajatomar788/pywebcopy
Locally saves webpages to your hard disk with images, css, js & links as is.
archive-tool crawler html html-parser mirror python web webpage
Last synced: 08 Jul 2025
https://github.com/avidlearnerinprogress/python-automation-scripts
Simple yet powerful automation stuffs.
beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day
Last synced: 05 Apr 2025
https://github.com/avidLearnerInProgress/python-automation-scripts
Simple yet powerful automation stuffs.
beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day
Last synced: 24 Apr 2025
https://github.com/iudicium/pryingdeep
Prying Deep - An OSINT tool to collect intelligence on the dark web.
crawler darkweb go gocolly golang-osint onion osint osint-tools pryingdeep security-tools
Last synced: 14 Jan 2026
https://github.com/c0d3d3v/moodle-dl
Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)
crawler downloader hacktoberfest moode-crawler moodle moodle-dl moodle-downloader scraper sync
Last synced: 21 Oct 2025
https://github.com/zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider
Last synced: 15 May 2025
https://github.com/crawljax/crawljax
Crawljax
crawler crawling dom dynamic event-driven-crawling javascript test-generation web-analysis web-testing
Last synced: 16 May 2025
https://github.com/abhisharma404/vault
swiss army knife for hackers
crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability
Last synced: 02 Apr 2025
https://github.com/nanshihui/Scan-T
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 04 May 2025
https://github.com/nanshihui/scan-t
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 02 Apr 2025
https://github.com/jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling
Last synced: 04 Apr 2025
https://github.com/chushuai/wscan
Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.
cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss
Last synced: 11 Jul 2025
https://github.com/dirtyfilthy/freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
crawler darknet hidden-services onion scraper spider tor
Last synced: 07 Apr 2025
https://github.com/AlexMathew/scrapple
A framework for creating semi-automatic web content extractors
beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression
Last synced: 29 Mar 2025
https://github.com/scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping
Last synced: 11 Apr 2025
https://github.com/yhy0/jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)
apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 05 Apr 2025
https://github.com/cyubuchen/free_proxy_website
获取免费socks/https/http代理的网站集合
crawler free-proxy-list ip proxy proxy-checker spider
Last synced: 11 May 2025
https://github.com/shaohua0116/ICLR2020-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
conference crawler data-analysis iclr iclr2020 machine-learning visualization
Last synced: 19 Jul 2025
https://github.com/TikHub/TikHub-API-Python-SDK
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin
Last synced: 11 May 2025
https://github.com/AndyTheFactory/newspaper4k
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping
Last synced: 14 Mar 2025
https://github.com/tasos-py/Search-Engines-Scraper
Search google, bing, yahoo, and other search engines with python
bing crawler google python scraper search-engine yahoo
Last synced: 09 Jul 2025
https://github.com/lgraubner/sitemap-generator
Easily create XML sitemaps for your website.
crawler google seo sitemap sitemap-generator xml-sitemap
Last synced: 15 May 2025
https://github.com/roniemartinez/dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath
Last synced: 16 Mar 2025
https://github.com/gadfly0x/signature_algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
crawler reverse-engineering spider
Last synced: 27 Apr 2025
https://github.com/0xMassi/webclaw
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping
Last synced: 04 Apr 2026
https://github.com/howie6879/magic_google
Google search results crawler, get google search results that you need
crawler google google-search spider
Last synced: 14 Dec 2025
https://github.com/0x676e67/wreq
An ergonomic Rust HTTP Client with TLS fingerprint
akamai boringssl crawler fingerprint http http-client http2 https impersonate ja3 ja4 requests rust scraper tls tls-client tls-fingerprint web-scraper web-scraping websocket
Last synced: 02 Aug 2025
https://github.com/smuyyh/crawlerforreader
Android 本地网络小说爬虫,基于jsoup及xpath
android bookreader crawler jsoup xpath
Last synced: 06 Apr 2025
https://github.com/shaohua0116/ICLR2019-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
crawler crawling-python openreview tutorial
Last synced: 19 Jul 2025
https://github.com/mhmdiaa/second-order
Second-order subdomain takeover scanner
crawler crawling infosec mapping penetration-testing penetration-testing-tools pentesting recon reconnaissance security security-tools web-application-security wordlist wordlist-generator
Last synced: 05 Apr 2025
https://github.com/Josue87/EmailFinder
Search emails from a domain through search engines
Last synced: 05 May 2025
https://github.com/flairnlp/fundus
A very simple news crawler with a funny name
cc-news commoncrawl corpus corpus-tools crawler datasets image-classification image-extraction news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping
Last synced: 08 Jan 2026
https://github.com/brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
crawler distributed-crawler robot sitemap web-crawler
Last synced: 12 Jan 2026
https://github.com/microsoft/ghcrawler
Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
crawler data github github-api github-webhooks ospo
Last synced: 27 Sep 2025
https://github.com/dennis-tra/nebula
🌌 An agnostic network crawler exposing comprehensive peer information and network topology information.
cid crawler filecoin golang hacktoberfest ipfs libp2p
Last synced: 09 Jun 2026
https://github.com/xorbit01/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 15 Dec 2025
https://github.com/chishui/jssoup
JavaScript + BeautifulSoup = JSSoup
beautifulsoup crawler html javascript nodejs parser react-native spider
Last synced: 16 May 2025
https://github.com/scrapy-plugins/scrapy-zyte-smartproxy
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
crawler crawler-detection plugin proxy scraping scrapy
Last synced: 16 May 2025
https://github.com/XORbit01/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 14 Apr 2025
https://github.com/duzun/hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
broken-html crawler css-selectors domcrawler fast hquery html html-parser invalid-html jquery-like jquery-selectors parser php psr-0 psr-4 scraper selectors xml xml-parser
Last synced: 14 May 2025
https://github.com/crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/salimk/rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 12 Apr 2025
https://github.com/salimk/Rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 14 Mar 2025
https://github.com/Evil0ctal/Fast-Powerful-Whisper-AI-Services-API
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot
Last synced: 05 Apr 2025
https://github.com/evil0ctal/fast-powerful-whisper-ai-services-api
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot
Last synced: 16 May 2025
https://github.com/rivermont/spidy
The simple, easy to use command line web crawler.
crawler crawling python python3 web-crawler web-spider
Last synced: 16 Jan 2026
https://github.com/misaka10843/copymanga-downloader
使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画(无速率限制),支持批量+选话下载和获取您收藏的漫画并下载及半自动获取订阅下载!(全平台支持(pypi)) Nas版本请查看copymanga-nasdownloader
comic copymanga crawler downloader python python3
Last synced: 14 Jan 2026
https://github.com/commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler
Last synced: 12 Jun 2025
https://github.com/xiyuan-fengyu/ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript
Last synced: 05 Apr 2025
https://github.com/yangjianxin1/qqmusicspider
基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料
crawler music musicspider qqmusic scrapy
Last synced: 27 Oct 2025
https://github.com/jsrei/crawler-js-hook-framework-public
JS逆向Hook工具集,开源部分工具到这里
Last synced: 26 Jan 2026
https://github.com/dmi3kno/polite
Be nice on the web
crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping
Last synced: 22 Oct 2025
https://github.com/lgraubner/sitemap-generator-cli
Creates an XML-Sitemap by crawling a given site.
cli crawler google seo sitemap xml-sitemap
Last synced: 13 Apr 2025
https://github.com/krypton-byte/tiktok-downloader
Tiktok Downloader/Scraper using requests & bs4
asynchronous asyncio beautifulsoup bs4 crawler downloader flask krypton-byte lightweight nowm python python3 requests tiktok watermark web without
Last synced: 06 Apr 2025
https://github.com/jeffersonqin/lightnovel_epub
🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻之国度、轻小说文库
cli crawler ebook epub lightnovel lk novel opencv python scraper uiautomator wenku8
Last synced: 26 Oct 2025
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 11 Apr 2026
https://github.com/marshalx/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 16 May 2025
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 04 Apr 2025
https://github.com/MarshalX/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 15 May 2025