Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-20 00:06:46 UTC
- JSON Representation
https://github.com/shin202/flixhq-core
Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.
api apis core crawler library movies movies-api node-js nodejs scraper typescript
Last synced: 11 Apr 2025
https://github.com/dachcom-digital/pimcore-lucene-search
Pimcore Website Indexer (powered by Zend Search Lucene)
crawler lucene lucenesearch pimcore
Last synced: 07 Mar 2026
https://github.com/novemberde/serverless-crawler-demo
Serverless Architecture Crawler demo
aws crawler demo handson serverless
Last synced: 21 Jul 2025
https://github.com/bartozzz/crawlerr
A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
crawler jsdom nodejs scraper spider web-crawler
Last synced: 23 Apr 2025
https://github.com/drogbadvc/crawlit
This project is a web crawler based on Scrapy, visualization 2D, PageRank
Last synced: 24 Oct 2025
https://github.com/weihanli/proxycrawler
代理爬虫服务,爬取代理IP并保存到 Redis 中, topshelf+Quartz.Net+redis
Last synced: 19 Jun 2025
https://github.com/Ryaang/gpt-web-crawler
A web crawler for GPTs to build knowledge bases 用于GPT构建知识库的网站爬虫
chatgpt crawler gpt-crawler knowledge-base
Last synced: 11 Jul 2025
https://github.com/nstapelbroek/estate-crawler
Scraping the real estate agencies for up-to-date house listings as soon as they arrive!
appartments crawler huurwoningen nederland python real-estate-agencies scrapy scrapy-crawler
Last synced: 18 Jan 2026
https://github.com/generals-space/site-mirror-go
来自[码云](https://gitee.com/generals-space/site-mirror-go) 通用爬虫, 仿站工具, 整站下载
commoncrawl crawler mirror spider
Last synced: 14 Jan 2026
https://github.com/paceaux/selector-finder
Find a CSS selector on a public site
crawler css javascript nodejs screenshots selector-finder
Last synced: 12 Jan 2026
https://github.com/jimouchen/bing-chat-fxxk
newbing api by PlayWright
bing-api crawler crawler-python gpt
Last synced: 16 Jun 2025
https://github.com/flashnuke/webrecon
A collection of pentesting web scanners
crawler cyber directory-serach dns-enumeration kali-linux pentest pentesting port-scanning python recon reconnaissance scanner security
Last synced: 17 Mar 2025
https://github.com/feng19/spider_man
SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.
crawler data-mining elixir erlang framework spider
Last synced: 03 Apr 2025
https://github.com/mechazawa/redbetter-wm2
Better.php crawler for Redacted that uses WhatManager
crawler flac redacted seedbox transcoding whatcd whatmanager
Last synced: 15 Mar 2026
https://github.com/bitxx/pholcus
对基于golang的henrylee2cn/pholcusl爬虫框架的修复和完善,满足自身需要
Last synced: 12 Jul 2025
https://github.com/qibinlou/faceplusplus-stars-library-images-crawler
Face++ starlib 明星库头像标注集爬虫及图片集合,用于face recognition training
crawler faceplusplus image-recognition images traning
Last synced: 11 Jul 2025
https://github.com/omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 23 Apr 2025
https://github.com/kagami/tistore
:camera: Tistory photo grabber
crawler cross-platform electron tistory
Last synced: 05 May 2025
https://github.com/rajat19/torrent-crawler
crawls and stores list of torrent links
bs4 crawler hacktoberfest python3 torrent
Last synced: 08 Apr 2026
https://github.com/tokahuke/lopez
Crawling and scraping the Web for fun and profit
crawler rust scraper seo web-scraping
Last synced: 01 Mar 2026
https://github.com/HHN/crawler4j
Open Source Web Crawler for Java - A fork of yasserg/crawler4j
crawler crawler4j java spider web-crawler web-spider
Last synced: 05 Oct 2025
https://github.com/alessandrodd/googleplay_api
Google Play Unofficial Python 3 API Library
android crawler googleplay googleplay-api playstore
Last synced: 19 Mar 2025
https://github.com/spider-rs/spider-clients
Python, Javascript, and Rust libraries for the Spider Cloud API.
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider supabase web-scraping
Last synced: 09 Apr 2026
https://github.com/thd3r/godork
Advanced & Fast Google Dorking Tool
crawler dork-scanner dorking godork google-dorking google-dorks osint-tool python
Last synced: 09 Apr 2026
https://github.com/jroakes/crawlnchat
A modular web crawling and chat system that allows for ingesting website content through XML sitemaps, converting to vector embeddings, and providing AI-powered chat interfaces through multiple frontend options.
crawler langgraph-python openai pinecone rag
Last synced: 11 Apr 2025
https://github.com/ph-7/emails-scraper
:ram: Simple PHP Email Grabber to get emails from a txt file containing the list of urls (add one url per line).
crawler email email-grabber email-scraper grabber php php-scraper scrape scrape-email scraper scraping script
Last synced: 09 Apr 2025
https://github.com/pyladies-brazil/crawler-tutorial
Tutorial de raspagem de dados realizado em parceria com a JusBrasil
beautifulsoup brazil crawl crawler crawler-python crawling-python pyladies pyladies-brasil pyladies-workshop python python-tutorial raspagem-de-dados requests-python
Last synced: 07 Feb 2026
https://github.com/yokawasa/scrapy-azuresearch-crawler-samples
Scrapy as a Web Crawler for Azure Search Samples
azure azure-search crawler python python3 scrapy search
Last synced: 26 Mar 2025
https://github.com/jianboy/crawl_xuexi
学习强国APP上机器学习课程,学习慕课视频批量下载
crawler machine-learning python xuexi
Last synced: 17 Jul 2025
https://github.com/norconex/collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
crawler filesystem-crawler java norconex-filesystem-collector search-engine
Last synced: 05 Nov 2025
https://github.com/racinmat/premium-downloader
crawler pornhub pornhub-downloader python
Last synced: 07 Apr 2025
https://github.com/postman-open-technologies/openapi-web-search
OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀
crawler dataset gsoc gsoc-2023 openapi search-engine swagger
Last synced: 10 Apr 2025
https://github.com/petehouston/udemy-crawler
Crawling Udemy course info and save into JSON format.
crawler crawling node node-cli udemy udemy-api udemy-crawl
Last synced: 06 Jul 2025
https://github.com/capjamesg/indieweb-search
Source code for the IndieWeb search engine.
crawler indieweb search search-engine
Last synced: 10 May 2025
https://github.com/HengXin666/BiLiBiLi_DanMu_Crawling
爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内有算法可保证几乎不丢失弹幕情况下, 减少请求次数, 以提高爬取速度; 有GUI界面, 支持继续爬取. 通过二分确认最早有弹幕的日期, 再而爬取; 内置弹幕文件去重和弹幕文件合并功能
bilibili-danmaku crawler danmaku python
Last synced: 27 Mar 2025
https://github.com/Actomaton/ActoCrawler
🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.
Last synced: 22 Jul 2025
https://github.com/chainski/proxyscraper
cplusplus crawler http https proxies proxy proxy-api proxygrabber proxyscraper proxytool scraper socks4 socks5
Last synced: 01 Apr 2026
https://github.com/RuedigerVoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 17 Apr 2025
https://github.com/nvk681/gumo
A crawler that extracts data from a dynamic webpage. Written in node js.
crawler elasticsearch neo4j nodejs
Last synced: 16 Mar 2026
https://github.com/asing1001/movierater
A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.
apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript
Last synced: 12 Jan 2026
https://github.com/kangfend/ig-scraper
Instagram hashtag scraper
crawler hashtag-scraper ig-scraper instagram instagram-hashtag-scraper scraper
Last synced: 14 Dec 2025
https://github.com/fanhuaandluomu/qqspider
爬取QQ用户信息(qq号、昵称、生日、地址等基本信息)并做简要analysis。
Last synced: 01 May 2025
https://github.com/ruedigervoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 13 Apr 2025
https://github.com/thaoshibe/crawl-original-google-images
python scripts for crawling original image from Google Images
chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search
Last synced: 28 Oct 2025
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler
Last synced: 08 Oct 2025
https://github.com/woojubb/html-article-extractor
A web page content extractor
article-extracting article-extractor crawler crawling extraction extractor
Last synced: 24 Dec 2025
https://github.com/capturr/scraper
All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.
captcha cheerio crawler crawling data declarative extract growth-hacking html javascript json jsonld nodejs recaptcha scraper scraping spider typescript web web-scraping
Last synced: 02 Aug 2025
https://github.com/ArchiveTeam/WebArchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 07 Apr 2025
https://github.com/aman-codes/webevaluator
A web crawling tool which tests websites for SSL, Cookies and ADA compliance and also suggests ways to fix them.
ada collaborate compliance cookie crawler learn ssl
Last synced: 01 Jul 2025
https://github.com/archiveteam/webarchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 15 May 2025
https://github.com/neuralegion/bright-cli
Command Line Interface (CLI) tool for BrightSec's solutions.
api cli crawler cyber-security devops har nexploit oas secops security typescript
Last synced: 01 Apr 2026
https://github.com/gplumb/netcrawlerdetect
A .net standard port of JayBizzle's CrawlerDetect project (https://github.com/JayBizzle/Crawler-Detect).
bots c-sharp crawler detect dotnet-core dotnet-standard spider user-agent
Last synced: 11 Apr 2025
https://github.com/raintree-technology/docpull
Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
ai-training-data cli crawler developer-tools documentation llm markdown mcp pypi python rag web-scraping
Last synced: 24 Apr 2026
https://github.com/tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin
Last synced: 22 Apr 2025
https://github.com/p0dalirius/crawlersuseragents
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
bugbounty crawler crawlers pentest request tool user-agent web
Last synced: 03 Sep 2025
https://github.com/s045pd/sharingan
We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon
asyncio crawler httpx python38 social-network
Last synced: 11 Apr 2025
https://github.com/spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping
Last synced: 30 Jun 2025
https://github.com/zyszys/zhengfang_system_spider
:bug:一只登录正方教务管理系统,爬取数据的小爬虫
crawler python spider zhengfang
Last synced: 16 May 2025
https://github.com/smolijar/offensive-fortune
A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.
crawler fortune fortune-cookie vilejoke
Last synced: 06 Sep 2025
https://github.com/axel-dev/anime-tracker
:spider_web: All in one place to track your favorite animes
angular anime anime-scraper crawler scraper web-extension
Last synced: 08 Oct 2025
https://github.com/xbynet/crawler
A simple and flexible web crawler framework for java.
crawler httpclient java jsoup spider
Last synced: 14 Jan 2026
https://github.com/sigoden/rag-crawler
Crawl a website to generate knowledge file for RAG
Last synced: 24 Jul 2025
https://github.com/mediamonks/crawler
Crawl your own website with various clients for SEO and indexing purposes.
browserkit crawler crawling php prerender prerenderio seo spider
Last synced: 28 Jul 2025
https://github.com/pourmand1376/persiancrawler
Open source crawler for Persian websites.
crawler machine-learning news python scrapy tasnim text-classification
Last synced: 28 Oct 2025
https://github.com/wx-chevalier/sentinel-cendertron
Cendertron = Crawler + cendertron, Crawl AJAX-heavy client-side Single Page Applications (SPAs), deploying with docker, focusing on scraping requests(page urls, apis, etc.), followed by pentest tools(Sqlmap, etc.). Cendertron can be used for extracting requests(page urls, apis, etc.) from your Web 2.0 page.
cendertron crawler crawler-cendertron wx-be wx-code wx-pentest
Last synced: 25 Feb 2026
https://github.com/chairco/2017_pycontw_talk
crawler django django-q pycontw scheduled-tasks task
Last synced: 30 Jul 2025
https://github.com/fanyong920/crawlitem
用于爬取淘宝天猫网页的谷歌插件
crawler javascript taobao tmall
Last synced: 21 Jul 2025
https://github.com/crackcomm/crawl
Lightweight library for scalable crawlers in Go.
Last synced: 16 Feb 2026
https://github.com/discovai/discovai-crawl
🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.
ai api crawler embedding vector-database web-scraping
Last synced: 01 May 2025
https://github.com/bkeepers/spiderman
your friendly neighborhood web crawler
crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping
Last synced: 14 Oct 2025
https://github.com/PadishahIII/SecretScraper
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper
Last synced: 29 Jul 2025
https://github.com/inspirehep/hepcrawl
Scrapy project for feeds into INSPIRE-HEP
crawler harvest-data publishing python
Last synced: 11 Apr 2025
https://github.com/henr1ko/pixthief
Stealthy .NET 8 console tool that crawls pages or whole domains and downloads images with optional format conversion.
console-application crawler csharp dotnet http-client image-downloader web-scraper web-scraping webscraper win-x64 windows
Last synced: 15 May 2026
https://github.com/twtrubiks/youtube-trends-spider
crawler youtube trends use selenium on python
crawler python selenium tutorial youtube-trends-spider
Last synced: 07 Mar 2026
https://github.com/paambaati/websight
🕷A simple but *really* fast crawler built with Node.js & TypeScript
coding-challenge crawler interview-questions javascript monzo nodejs typescript
Last synced: 13 Apr 2025
https://github.com/douglasdcm/search-jobs
Project to get jobs from public career websites
ai crawler docker docker-compose jobs jobsearch jobseeker python recruitment
Last synced: 12 Feb 2026
https://github.com/chainski/chino-proxy-scraper
A python script that scrape proxies from frequently updated proxy sources.
crawler http https proxies proxy proxy-api proxygrabber proxyscrape-api proxyscraper proxytool python python3 scraper socks4 socks5
Last synced: 24 Apr 2025
https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel
A toolkit for Spatie's Crawler and Laravel.
crawler laravel laravel-crawler php-crawler php-scraper spatie-crawler
Last synced: 01 May 2025
https://github.com/wahengchang/node-dcard-scraper
it is an example of implementing cheerio scraper of extracting images in dcard
cheerio crawler dcard example javascript nodejs npm scraper tutorial
Last synced: 11 Apr 2025
https://github.com/lucassmacedo/mercadolivreproductscrawler
PHP Console Crawler to Download Products from a Store on MercadoLivre.com.br
crawler eloquent illuminate laravel laravel-zero php
Last synced: 10 Oct 2025
https://github.com/abhineetraj1/phonenumber-scraper
This will tell you which carrier does your SIM belongs. Make sure your internet connection before running this !!
crawler phone-number-information phone-number-validation python3 scraper
Last synced: 10 Oct 2025
https://github.com/lupino/grapy
Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.
crawler python-library python3 spider
Last synced: 29 Jun 2025
https://github.com/enijkamp/supermonkey
A crawler for automated Android UI testing.
Last synced: 26 Jun 2025
https://github.com/alinebastos/crawler
Web Crawler created with Node.js and Puppeteer
crawler fs javascript nodejs puppeteer scraping
Last synced: 05 Apr 2025
https://github.com/ptsochantaris/bloo
Your search engine on your device
crawler ios ios-app macos macos-app productivity search-engine spotlight spotlight-search swift testflight
Last synced: 13 Apr 2025
https://github.com/ElyaConrad/XML-Parser
A Node.js XML DOM, Parser & Stringifier.
crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema
Last synced: 21 Mar 2025