Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-19 00:07:20 UTC
- JSON Representation
https://github.com/melroy89/metacritic_api
PHP Metacritic API - Mirror from my GitLab
api crawler data metacritic parser php scores scraper webscraping
Last synced: 13 May 2025
https://github.com/tzw0745/tumblr-crawler-cli
Tumblr Download Tool with High Speed and Customization. 高性能&高定制化的Tumblr下载工具。
cli-app crawler python tumblr tumblr-downloader
Last synced: 13 Jul 2025
https://github.com/aufzayed/HydraRecon
All In One, Fast, Easy Recon Tool
bugbounty bugbounty-tool bugbountytips crawler hacking hacking-tools information-gathering open-source-intelligence osnit pentest pentest-tools pentesting python recon recon-tools
Last synced: 10 May 2025
https://github.com/flickz/newspaperjs
News extraction and scraping. Article Parsing
crawler news news-aggregator nodejs scraper webcrawling webscraping
Last synced: 02 Jun 2026
https://github.com/minicloudsky/eastmoney
python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund and stock data,for data analysis and visualiaztion .
crawler database django eastmoney financial-analysis financial-data metabase mysql nodejs python vue vuejs
Last synced: 10 Jul 2025
https://github.com/lucasayres/python-tools
A collection of Python tools, scripts and utilities to make your life easier.
automation codes collection crawler functions geolocation helper libs pdf python qrcode recipes scripts speech sqlalchemy tips tools tricks unzip utilities
Last synced: 16 May 2025
https://github.com/drkostas/jobapplicationbot
A bot that automatically sends emails to new ads posted in any desired xe.gr search url.
bot crawler email-sender python scraper
Last synced: 23 Sep 2025
https://github.com/zhang2333/light-crawler
a simplified directed customizable website crawler
Last synced: 06 Sep 2025
https://github.com/usernam3/shopify-app-store-scraper
Crawler behind the Shopify App Marketplace dataset
crawler dataset-creation shopify
Last synced: 07 Apr 2025
https://github.com/liameno/librengine
Privacy Web Search Engine (not meta, own crawler)
cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine
Last synced: 15 Jan 2026
https://github.com/spider-rs/spider-py
Spider ported to Python
crawler headless-chrome python scraper spider web-crawler
Last synced: 05 Apr 2025
https://github.com/mzollin/qr-pirate
crawl QR-codes from search engines and look for bitcoin private keys
bitcoin bitcoin-wallet crawler cryptocurrency private-key python qr-code qrcode qrcode-reader
Last synced: 28 Oct 2025
https://github.com/us/crw
Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.
ai ai-agents crawler data-extraction docker firecrawl firecrawl-alternative html-to-markdown llm markdown mcp mcp-server rust scraping-api self-hosted tavily-alternative web-crawler web-scraper web-scraping web-search-api
Last synced: 09 May 2026
https://github.com/nekolr/slime
🍰 A visual crawler management platform
crawler spider visual-crawler websocket
Last synced: 16 May 2025
https://github.com/shurco/goClone
🌱 goClone - clone websites in seconds
cloner cloning crawler crawling go goclone golang hacktoberfest scraping scraping-websites scrapper website-cloner website-scraper wp2static
Last synced: 05 May 2025
https://github.com/trudi-group/ipfs-crawler
A crawler for the IPFS network, code for our paper (https://arxiv.org/abs/2002.07747). Also holds scripts to evaluate the obtained data and make similar plots as in the paper.
crawler ipfs ipfs-network kademlia-dht libp2p
Last synced: 12 Jun 2025
https://github.com/pablouser1/tikscraperphp
Wrapper for TikTok API
crawler php scraper scraping tiktok tiktok-api wrapper
Last synced: 07 May 2025
https://github.com/muhac/chinese-holidays-calendar
Calendar of Public Holidays in China 中国大陆节假日日历订阅 自动节假日闹钟
automation calendar chinese-holidays crawler events ics-files
Last synced: 09 May 2025
https://github.com/saltyshiomix/nest-crawler
An easiest crawling and scraping module for NestJS
crawler nestjs nodejs scraper typescript
Last synced: 16 Mar 2025
https://github.com/nightmarcher/zhihu-crawler
徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展示。
crawler developing mongodb pipenv python3 redis selenium spider zhihu
Last synced: 26 Jun 2025
https://github.com/howie6879/hproxy
hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)
asyncio crawler crawlers hproxy proxy proxy-pool proxy-spider sanic schedule
Last synced: 16 May 2025
https://github.com/wenyalintw/google-patents-scraper
Automatically download all PDF files of searching results & their patent families found on Google Patents.
crawler google-patents patent patents pdf scraper scraping scrapy web-scraping
Last synced: 03 Mar 2026
https://github.com/webcoding/js_block
研究学习各种拦截:反爬虫、拦截ad、防广告注入、斗黄牛等
block-ad block-res block-spider crawler nodejs spider
Last synced: 23 Jan 2026
https://github.com/vifreefly/rubium
Antidetect Headless Chrome Browser for Ruby Web Scraping and Automation
antidetect-browser automation capybara chromium crawler headless playwright puppeteer ruby scraping web-scraping
Last synced: 12 Feb 2026
https://github.com/aziz0x48/xsmtp
xSMTP 🦟 Lightning fast, multithreaded smtp scanner targeting open-relay and unsecured servers in multiple network ranges.
bot crawler exploit exploit-scanner multithreading networking pentest-tool pentesting pentesting-tools portscan portscanner python python-exploits scanner-web security security-tools smtp smtp-cracker
Last synced: 16 Aug 2025
https://github.com/mirusu400/pinterest-infinite-crawler
An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!
crawler hacktoberfest pinterest pinterest-downloader python scraper scraping selenium
Last synced: 11 May 2026
https://github.com/absingh31/tor_spider
Python project to crawl and scrap the lesser known deep web or one can say dark web. Just provide the onion link and get started.
crawler file-manager ioc python3 scraper scraping socks stem tor tor-config tor-spider
Last synced: 11 May 2025
https://github.com/schollz/crawdad
Cross-platform persistent and distributed web crawler :crab:
Last synced: 22 Apr 2025
https://github.com/cho45/chemrtron
A document viewer; fuzzy match incremental search.
crawler document-viewer electron increment javascript
Last synced: 30 Dec 2025
https://github.com/koshort/koshort
(deprecated) :cat: koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
crawler korean nlp python streaming text-mining
Last synced: 19 Nov 2025
https://github.com/dannyben/snapcrawl
Crawl a website and take screenshots
capture crawler gem ruby screenshot
Last synced: 04 Apr 2025
https://github.com/x-way/crawlerdetect
Golang module to detect bots and crawlers via the user agent
bot-detection crawler crawler-detection detect go spider user-agent
Last synced: 09 Apr 2025
https://github.com/johanneszab/tumbltwo
TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.
crawler downloader photos ripper tumblr tumblr-blog tumblr-downloader videos
Last synced: 09 Mar 2026
https://github.com/harborzeng/crawler_jd_what_worthy_buying
爬取京东商品所有评论,利用情感分析,判断商品是否值得买
Last synced: 24 Apr 2025
https://github.com/niespodd/webrtc-local-ip-leak
Oh no, stop this. You can see my local IP address 😲! Use `foundation` attribute against CRC32 lookup table to reveal local IP address of a Chrome/Chromium visitor.
automation bot bot-detection crawler spider stealth webrtc
Last synced: 27 Aug 2025
https://github.com/fernandod1/instagram-downloader
Instagram user's photos and videos downloader. Download all media files from any username. Working 2022!
crawler crawling-python instagram instagram-downloader instagram-feed instagram-photos instagram-scraper python scrap scraper scraping scraping-python scraping-tool scraping-websites
Last synced: 03 May 2025
https://github.com/bajins/tool-gin
基于go-gin框架建立减少冗余动作项目,如:下载一些工具
crawler gin gin-gonic golang key keygen mobaxterm-keygen navicat nginx-conf nginx-configuration python3 registry-workshop scraper shell spider xftp xmanager xshell
Last synced: 08 Apr 2025
https://github.com/findopendata/findopendata
A search engine for Open Data
crawler dataset-search opendata
Last synced: 14 Jan 2026
https://github.com/fengzhizi715/piccrawler
使用RxJava2 和 Java 8的特性开发的图片爬虫
crawler java-8 parallel rxjava2
Last synced: 30 Oct 2025
https://github.com/sshwy/pku3b
🎓a Better BlackBoard for PKUers. 北京大学教学网命令行工具(🖥️Win/🐧Linux/🍏Mac), 支持查看/提交作业、下载课程回放.
blackboard-learn cli command-line-tool crawler m3u8 peking-university pku rust
Last synced: 30 Jan 2026
https://github.com/beomi/simple_bank_korea
simple crawler for Korean banks with Transactions
Last synced: 07 May 2025
https://github.com/hfreire/browser-as-a-service
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler
Last synced: 11 Sep 2025
https://github.com/lobehub/chat-plugin-web-crawler
🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.
ai chatgpt crawler function-calling lobe-chat lobe-chat-plugin openai
Last synced: 29 Mar 2025
https://github.com/howie6879/talospider
talospider - A simple,lightweight scraping micro-framework
crawler crawling python spider web-spider
Last synced: 25 Oct 2025
https://github.com/roccomuso/price-monitoring
Node.js price monitoring library, leveraging the power of x-ray and nightmare.
alert comparison crawler javascript monitoring nodejs price-tracker
Last synced: 14 Sep 2025
https://github.com/kabegame/kabegame
Kabegame — An anime image crawler client with pluggable crawlers (from a GitHub plugin repo), wallpaper rotation by custom rules, and Wallpaper Engine export. Supports Windows 10/11, macOS Big Sur+, and Ubuntu 24.04+.
android anime crawler linux macos no-electron open-source otaku tauri vue wallpaper windows
Last synced: 24 Apr 2026
https://github.com/forsti0506/a11y-sitechecker
Automatic accessibility checker with website crawling + screenshots for easy use
accessibility accessibility-criteria accessibility-testing axe crawler hacktoberfest open-source puppeteer typescript typescript-library
Last synced: 13 Jul 2025
https://github.com/jaymon/wishlist
Read an Amazon wishlist programmatically with Python
amazon amazon-wishlist api crawler python scraper
Last synced: 27 Oct 2025
https://github.com/eliashaeussler/cache-warmup
🔥 PHP library to warm up caches of URLs located in XML sitemaps
cache-warmup crawler php xml-sitemap
Last synced: 04 Apr 2025
https://github.com/a11ywatch/crawler
gRPC web crawler turbo charged for performance
a11ywatch crawler grpc scraper
Last synced: 16 Oct 2025
https://github.com/lanyeeee/bilibili-manga-download-script
一个用于 哔哩哔哩漫画 B漫 的下载脚本
bilibili bilibili-comic bilibili-manga crawler downloader manga tampermonkey tampermonkey-script tampermonkey-userscript
Last synced: 13 Jul 2025
https://github.com/twtrubiks/facebook-messenger-bot-tutorial
facebook-messenger-bot-tutorial use Python Django
bot crawler django facebook-messenger-bot ngrok ptt python tutorial webhooks
Last synced: 15 Apr 2025
https://github.com/he426100/alipay-crawler
支付宝账单爬虫
alipay crawler selenium selenium-ide selenium-php selenium-webdriver
Last synced: 08 Apr 2025
https://github.com/sachaarbonel/scrapy.dart
Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter
Last synced: 12 Mar 2026
https://github.com/mariot/chan-downloader
CLI to download all images/webms in a 4chan thread
4chan 4chan-downloader crawler scraper
Last synced: 13 Aug 2025
https://github.com/h12w/html-query
A fluent and functional approach to querying HTML
crawler dom go golang golang-package html parser
Last synced: 26 Jan 2026
https://github.com/farishijazi/rarbgcli
RARBG command line interface for scraping the rarbg.to torrent search engine
crawler rarbg rarbg-torrentapi torrent torrents torrents-crawler
Last synced: 17 Mar 2025
https://github.com/ReddyyZ/URLBrute-Py
Tool to brute website sub-domains and dirs.
brute-force bruteforcer crawler dir-scanner dirscanner dirsearch sub-domain-enumeration sub-domain-scanner
Last synced: 11 Jul 2025
https://github.com/evil0ctal/wechat-channels-video-file-decryption
一个可在线运行的微信视频号加密视频解密工具和 API 服务,基于逆向工程分析实现。本项目使用微信官方的 WebAssembly (WASM) 模块来生成 Isaac64 PRNG 密钥流,并通过 XOR 运算完成视频解密。
crawler reverse-engineering wechat wechat-api wechat-channel wechat-crawler wechat-hack wechat-hook wechat-video wechat-video-download
Last synced: 04 Apr 2026
https://github.com/valerebron/usetube
search & get datas from youtube no google account needed
crawler typescript video youtube youtube-api
Last synced: 13 Apr 2025
https://github.com/duoan/codes-scratch-crawler
读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本实现,网页去重的算法,网页指纹算法,文本信息挖掘
Last synced: 18 Jun 2026
https://github.com/pzaino/thecrowler
A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.
automation blue-team-tool content-detection content-discovery crawler crawling cyber-security cybersecurity cybersecurity-tools data-collection data-science distributed-systems golang indexer indexing reconnaissance red-team-tools scraping search-engine vulnerability-detection
Last synced: 06 Feb 2026
https://github.com/goldarowana/douyin-crawler
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
crawler douyin douyin-download java vertx
Last synced: 23 Oct 2025
https://github.com/eredotpkfr/subscan
⚡ A subdomain enumeration tool leveraging diverse techniques, designed for advanced pentesting operations
brute-force bruteforce crawler pentest pentest-tool pentesting pentesting-tool rust rust-crate rust-lang scanner searchengines subdomain subdomain-bruteforcing subdomain-enumeration subdomain-finder subdomain-scanner zonetransfer
Last synced: 23 Aug 2025
https://github.com/murat/tors
⏬ Yet another torrent searching application for your command line
crawler ruby-gem torrent-downloader torrent-search-engine
Last synced: 10 Mar 2026
https://github.com/mawrkus/jason-the-miner
⛏ A versatile Web scraper for Node.js
crawler crawling javascript scraper scraping web-scraper
Last synced: 08 Apr 2025
https://github.com/Conso1eCowb0y/Deepminer
Deep web crawler and search engine
crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler
Last synced: 20 Apr 2025
https://github.com/mike442144/seenreq
Generate an object for testing if a request is sent, request is Mikeal's request.
crawler duplicates-removed post request spider url
Last synced: 31 Aug 2025
https://github.com/spk/maman
Rust Web Crawler saving pages on Redis
crawler http spider web web-crawler
Last synced: 07 Oct 2025
https://github.com/soruly/anilist-crawler
Crawl data from anilist API and store in MariaDB.
Last synced: 19 Jun 2025
https://github.com/riquellopes/fii
API para recuperar informações sobre FII
crawler investiment mongodb nodejs
Last synced: 16 Jan 2026
https://github.com/joaopauloaramuni/python
Repo Python
crawler python scraping scrapy
Last synced: 10 Apr 2025
https://github.com/golang-collection/go-crawler-distributed
分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息的异步发送,使用到的框架包括:redigo, gorm, goquery, easyjson, viper, amqp, zap, go-micro,并通过Docker实现容器化部署,中间爬虫节点支持水平拓展。
crawler docker elasticsearch go go-micro gocrawler microservice rabbitmq
Last synced: 09 Jul 2025
https://github.com/liangWenPeng/scrapy-admin
A django admin site for scrapy
Last synced: 06 Aug 2025
https://github.com/houssemcharf/funutils
Some codes i wrote to help me with me with my daily errands ;)
bots crawler errand fun minify-javascript mp3converter plugins proxy puzzle-solver python python-script sorting-algorithms utilities
Last synced: 05 Apr 2025
https://github.com/lewoudar/scalpel
A fast and powerful web scraping library
anyio asyncio crawler gevent python scalpel trio webscraping
Last synced: 19 Jan 2026
https://github.com/forceh91/retro-env-can-weather-chan
Retro Environment Canada Weather Channel for your browser
cable cable-tv canada crawler environment-canada javascript nodejs react recreation retro simulator typescript weather weather-app weather-channel weather-conditions weather-forecast winnipeg
Last synced: 01 Mar 2026
https://github.com/jsrei/page-redirect-code-location-hook
JS逆向技巧:页面跳转JS代码定位通杀方案
Last synced: 19 Apr 2025
https://github.com/0xhjk/x12306
12306查票助手,一键查询沿途所有站点,先上车后补票,让你的出行更省心。
12306 12306buyticket 12306helper 12306qiang-piao crawler fk12306 helper reqeusts spider ticket train x12306
Last synced: 26 Feb 2026
https://github.com/healeycodes/broken-link-crawler
:robot: Python bot that crawls your website looking for dead stuff
Last synced: 30 Apr 2025
https://github.com/healeycodes/Broken-Link-Crawler
:robot: Python bot that crawls your website looking for dead stuff
Last synced: 27 Sep 2025