Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-20 00:06:46 UTC
- JSON Representation
https://github.com/ronin-rb/ronin-web
ronin-web is a collection of useful web helper methods and commands.
cli crawler hacktoberfest helpers html proxy-server ronin-rb ruby server spider web xml
Last synced: 03 Oct 2025
https://github.com/spider-rs/spider-nodejs
Spider ported to Node.js
crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript
Last synced: 31 Mar 2025
https://github.com/zenrows/scaling-to-distributed-crawling
Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.
crawler crawling distributed python python3 scraping spider
Last synced: 18 Mar 2026
https://github.com/moskrc/crawlerdetect
🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.
bot crawler detect python spider user-agent
Last synced: 16 Jan 2026
https://github.com/Maicius/UniversityRecruitment-sSurvey
用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?
analysis beautifulsoup crawler data redis university
Last synced: 06 Mar 2025
https://github.com/wael-sudo2/facebook-page-info-scraper
Free Facebook pages MetaData Scraping Library - Unlimited Calls
crawler crawling-python crm data-analysis data-mining facebook facebook-apis facebook-page-information facebook-page-scraper facebook-scraper facebook-scraping leadgeneration marketing metadata python scraping scraping-python selenium
Last synced: 04 Feb 2026
https://github.com/mrxujiang/crawel
基于Apify+node+react搭建的有点意思的爬虫平台
apify crawler node puppeteer react react-hooks umi umi3
Last synced: 13 Apr 2025
https://github.com/jonaslejon/lolcrawler
Headless web crawler for bugbounty and penetration-testing/redteaming
bugbounty crawler docker penetration-testing penetration-testing-tools redteam redteam-tools redteaming
Last synced: 12 Jul 2025
https://github.com/elboletaire/php-crawler
:spider: A simple crawler (spider) writen in php just for fun, with zero dependencies
Last synced: 10 Jan 2026
https://github.com/p0dalirius/robotstester
This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.
bugbounty crawler pentesting python robots tool
Last synced: 21 Aug 2025
https://github.com/axetroy/crawler
nodejs 爬虫框架. crawler framework for nodejs
Last synced: 18 Jun 2025
https://github.com/maicius/universityrecruitment-ssurvey
用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?
analysis beautifulsoup crawler data redis university
Last synced: 28 Apr 2025
https://github.com/scrapfly/python-scrapfly
Scrapfly Python SDK for headless browsers and proxy rotation
crawler headless-browser python scraper scraping scraping-api sdk web-scraper web-scraping
Last synced: 14 Apr 2025
https://github.com/botcity-dev/botcity-framework-web-python
BotCity Framework Web - Python
automation automation-framework crawler python robotic-process-automation rpa selenium testing web webdriver webscraping
Last synced: 05 Apr 2025
https://github.com/rix4uni/uforall
uforall is a fast url crawler this tool crawl all URLs number of different sources, alienvault,WayBackMachine,urlscan,commoncrawl
alienvault bugbounty commoncrawl crawler osint recon reconnaissance urlscan wayback
Last synced: 15 Apr 2025
https://github.com/veliovgroup/spiderable-middleware
Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages
crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable
Last synced: 12 Apr 2025
https://github.com/charlespikachu/seleniumlogin
Login some website using selenium.
crawler selenium selenium-webdriver spider taobao
Last synced: 23 Oct 2025
https://github.com/VeliovGroup/spiderable-middleware
Pre-rendering for JavaScript websites that delivers SSR-level SEO, enhanced link previews, and performance via effortless middleware integration — ideal for PWAs, SPAs, and modern JS-driven apps, websites, and webpages
crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable
Last synced: 13 May 2025
https://github.com/taseikyo/crawler
:snake:A collection of simple Python crawlers.
baidu-tieba bilibili bing crawler douban pixiv python-crawler python3 youku
Last synced: 19 Oct 2025
https://github.com/kkomelin/insecres
A console tool that finds insecure resources on HTTPS sites
Last synced: 22 Jun 2025
https://github.com/VAllens/CrawlerSamples
This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.
anglesharp chsarp crawler dotnetcore headless headless-browsers headless-chrome headless-chromium puppeteer
Last synced: 04 May 2025
https://github.com/ryuchen/deadpool
该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。
celery crawler deadpool python3 spider taobao taobao-spider tmall tmall-spider
Last synced: 21 Mar 2025
https://github.com/armand1m/papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
cache crawler jsdom nodejs scraper scraping typescript web-scraping
Last synced: 28 Jan 2026
https://github.com/cyb3rmx/d00r
Simple directory brute-force tool written with python.
brute-force bruteforce crawler directory-lister hunt hunter linux login pentesting python3 security security-tools termux-hacking
Last synced: 11 Jul 2025
https://github.com/kylemocode/medium-stat-box
Practical pinned gist which show your latest medium status 📌
awesome-pinned-gists crawler github-action github-gists medium-stats
Last synced: 17 Apr 2025
https://github.com/NatsuFox/Tapestry
Tapestry - 基于 Agent Skill Bundle 的轻量级书签知识库 https://natsufox.github.io/Tapestry
agent-skills claude-code codex crawler knowledge-base openclaw workflow
Last synced: 27 Apr 2026
https://github.com/m-ahmadi/tse-client
A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.
browser caching cli cli-app compression crawler data dataset downloader iran node-module stock stock-data stock-market stock-prices tehran ticker tsetmc universal
Last synced: 18 Feb 2026
https://github.com/alwalxed/wayurls
CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.
bugbounty bugcrowd crawler cyber-security cybersecurity golang golang-tools hackerone infosec intigriti osint osint-tool projectdiscovery tomnomnom tools virustotal wayback-machine web web-security
Last synced: 05 Sep 2025
https://github.com/m-haisham/novelsave_sources
A collection of webnovel sources offering varying amounts of scraping capability.
Last synced: 22 Jan 2026
https://github.com/migalabs/armiarma
Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network
crawler ethereum libp2p monitoring
Last synced: 21 Aug 2025
https://github.com/hengxin666/bilibili_danmu_crawling
爬取B站历史弹幕/全弹幕, 支持高级弹幕, Bas弹幕爬取. [2025年]可用; 内有算法可保证几乎不丢失弹幕情况下, 减少请求次数, 以提高爬取速度; 有GUI界面, 支持继续爬取. 通过二分确认最早有弹幕的日期, 再而爬取; 内置弹幕文件去重和弹幕文件合并功能
bilibili-danmaku crawler danmaku python
Last synced: 24 Jul 2025
https://github.com/iljan/narr
Download audio tracks from Netflix to sample your favorite shows
chrome-devtools-protocol cli crawler downloader music
Last synced: 27 Jul 2025
https://github.com/basemax/googleplaywebserviceapi
Tiny script to crawl information of a specific application in the Google play/store base on PHP.
api crawler crawler-php crawlers google-play google-play-api google-play-games google-play-service google-play-services google-play-store google-playstore hacktoberfest hacktoberfest2020 php php-crawler
Last synced: 05 May 2025
https://github.com/crawlerclub/crawler
Crawler4U, a general purpose focused crawler
crawler information-extraction spider
Last synced: 17 Jan 2026
https://github.com/hackfengJam/ArticleSpider
Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
crawler distributed-systems django elasticsearch scrapy
Last synced: 28 Mar 2025
https://github.com/flulemon/sneakpeek
Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis
crawler crawler-python crawlers crawling crawling-engine crawling-framework python python3 scraper scraper-api scraper-engine scrapers scraping scraping-framework vue website-crawler
Last synced: 14 Jan 2026
https://github.com/jfreegman/toxcrawler
A Tox DHT network crawler
crawler dht dht-network tox toxcore
Last synced: 14 Apr 2025
https://github.com/scrapingant/scrapingant-client-python
ScrapingAnt API client for Python.
crawler scraper scraping scrapingant scrapy webscraping
Last synced: 23 Sep 2025
https://github.com/safonovpro/node-html-crawler
Simple for use node html crawler (spider) of site web pages
Last synced: 12 Mar 2026
https://github.com/heyingcai/cetty
基于事件分发的爬虫框架
crawler event-dispatcher gather spider
Last synced: 04 May 2025
https://github.com/andreaskoch/gargantua
The fast website crawler
command-line crawler golang xml-sitemap
Last synced: 14 Apr 2025
https://github.com/haxzie-xx/instagram-downloader
Node.js/Express app to retrive instagram video/image download urls
crawler downloader express instagram instagram-scraper nodejs
Last synced: 18 Mar 2025
https://github.com/proxzima/darkspider
Anatomy and Visualization of the Network structure of the Dark web using multi-threaded crawler
collaborate crawler dark-web extractor github github-pages hacktoberfest networkx onion osint python scraper tor
Last synced: 14 Mar 2026
https://github.com/wolverinn/igxe-c5-buff-csgo-skins-sale-data-catch
Automatically get the csgo skins sale data on igxe.cn and buff and c5game.com.You can choose the specific skins to get data.
Last synced: 25 Mar 2025
https://github.com/apocelipes/schannel-qt5
A GUI client of schannel powered by therecipe/qt and golang
client-side crawler go golang goqt linux qcharts qt5
Last synced: 07 May 2025
https://github.com/ph-7/crawling-emails
Very simple bash script to crawl email addresses from a specific website.
bash crawler email email-scraper scrape scrape-email scraper scraping shell wget
Last synced: 22 Aug 2025
https://github.com/helviojunior/filecrawler
File Crawler index files and search hard-coded credentials
crawler crawling-python elasticsearch leaks leaks-scanner
Last synced: 08 Apr 2025
https://github.com/harismuneer/android-apps-downloader
📱 A utility for downloading Android apps from the Google Play Store and Xiaomi App Store (the Chinese App Store).
android-application-downloader android-apps-crawler android-market-scraper android-research android-scraper android-tool app-downloader crawler crawling-tool google-play-application-downloader google-play-store-scraper gplaycli open-source-project python-scraper research-tool scraper scraping-tool wget-utility xiaomi-apps xiaomi-store-scraper
Last synced: 30 Apr 2025
https://github.com/minhhungit/github-action-rss-crawler
Auto crawl RSS feeds using Github Action
crawler csharp github-actions litedb netcore rss rss-crawler rss-items
Last synced: 15 Jul 2025
https://github.com/pykong/pypergrabber
Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper
Last synced: 15 Apr 2025
https://github.com/kshru9/web-crawler
A multithreaded web crawler using two mechanism - single lock and thread safe data structures
concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader
Last synced: 23 Mar 2025
https://github.com/code4everything/visual-spider
欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow
crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization
Last synced: 04 Oct 2025
https://github.com/aliosm/kontests
Competitive programming contests schedule
a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder
Last synced: 23 Oct 2025
https://github.com/dept/octopus
Recursive and multi-threaded broken link checker
Last synced: 04 Mar 2026
https://github.com/mamal72/iranian-calendar-events
Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website
crawler events iranian jalali jalali-calendar persian
Last synced: 07 May 2025
https://github.com/debugtalk/webcrawler
A web crawler based on requests-html, mainly targets for url validation test.
crawler requests-html web-crawler weblink
Last synced: 15 Apr 2025
https://github.com/deptagency/octopus
Recursive and multi-threaded broken link checker
Last synced: 08 Jul 2025
https://github.com/gomjellie/pysaint
[deprecated] 유세인트 파이썬 클라이언트
crawler sap soongsil unofficial
Last synced: 30 Apr 2025
https://github.com/howie6879/php-google
Google search results crawler, get google search results that you need - php
crawler google-search php-google
Last synced: 16 May 2025
https://github.com/gimnathperera/abans-lk-webscraping
🌐 Web scraping script written in python using scrapy library in order to scrape product data from popular Sri Lankan web sites
Last synced: 30 Jun 2025
https://github.com/mjavadhpour/telegram-member-inviter
Crawling client's groups and channels to invite their members to a target group.
crawler python python3 robot telegram telegram-client telethon
Last synced: 19 Apr 2025
https://github.com/Decodo/Python-scraper-tutorial
A short introduction to scraping with Python with given steps and an example scraper script.
beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 02 May 2025
https://github.com/koallen/google-image-downloader
A script to download images from images.google.com
crawler google-images selenium
Last synced: 18 Jan 2026
https://github.com/ivan-sincek/chad
Search Google Dorks like Chad. / Broken link hijacking tool.
broken-link-takeover bug-bounty crawler ethical-hacking google google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web-penetration-testing
Last synced: 10 Mar 2026
https://github.com/k1low/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 16 Apr 2025
https://github.com/mattwang44/uspto-patft-web-crawler
Crawler for fetching information of US Patents and PDF bulk download
crawler patent patent-crawler pyqt5 python3 uspto
Last synced: 11 Oct 2025
https://github.com/ndgigliotti/shopify-spy
Extract structured data from Shopify websites.
crawler data data-acquisition data-science dropshipping ecommerce scrape scraper scraping scrapy shopify spider
Last synced: 26 Jan 2026
https://github.com/k1LoW/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 08 Jul 2025
https://github.com/jurooravec/crawlee-one
Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.
actor apify crawlee crawler framework scraper scraping web
Last synced: 09 Feb 2026
https://github.com/simionrobert/bitinsight
:earth_africa: Bittorrent Network Overview through Infohash Indexing, Metadata and IP visualisations of the DHT network
bep51 bittorrent crawler dht elasticsearch infohash javascript nodejs torrent
Last synced: 13 Apr 2025
https://github.com/endermanch/ddom
A simple, open-source, easy to use, and free download manager for malware samples.
crawler downloader malware manager samples
Last synced: 06 Sep 2025
https://github.com/italia/publiccode-crawler
publiccode.yml crawler for the Open Source software catalog of Developers Italia
crawler developers-italia hacktoberfest publiccode publiccodeyml
Last synced: 10 Feb 2026
https://github.com/bigsk1/supa-crawl-chat
Integrates Supabase with Crawl4AI and AI Chat to create a powerful web crawling and semantic search solution. Streamlit supabase data visualization. Run all in Docker. API and more!
crawl4ai crawler docker embeddings fastapi gpt-4o openai-api pgvector postgresql scraping streamlit supabase
Last synced: 15 May 2026
https://github.com/riptl/ytpriv
YT metadata exporter
big-data crawler csv datascience json video youtube
Last synced: 10 May 2025
https://github.com/alehkot/job-funnel-ts
Automated tool for scraping job postings into a .xlsx files inspired by Job Funnel.
crawler hacktoberfest jobs typescript
Last synced: 03 Aug 2025
https://github.com/o8e/soccer-scrape
:page_with_curl: Scrape football data from Bet365
bet365 betting crawler es6 football javascript puppeteer scraper soccer
Last synced: 10 Mar 2026
https://github.com/matheusfelipeog/froxy
Hide your IP with free proxies using Froxy 🔄
crawler free-proxy froxy hide-ip proxies proxies-scraper proxy python requests requests-module scraping
Last synced: 07 May 2025
https://github.com/alex-page/get-site-urls
🔗 Get all of the URL's from a website.
crawler sitemap-generator urls
Last synced: 16 Mar 2025
https://github.com/marcel0024/cococrawler
An declarative and easy to use web crawler and scraper in C#
cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper
Last synced: 10 Apr 2025
https://github.com/ERap320/CrowLeer
Powerful C++ web crawler based on libcurl
Last synced: 10 May 2025
https://github.com/fernandod1/producthunt-scraper
Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.
crawler crawling crawling-sites data-mining datamining producthunt producthunt-api producthunt-users python python-script python3 scrape scraped-data scraper scraper-engine scraping scraping-bot scraping-python scraping-tool scraping-websites
Last synced: 16 Jun 2025