Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/jonesrussell/north-cloud
A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.
Last synced: 25 Jan 2026
https://github.com/kettou/silentscraper
SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more
beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping
Last synced: 23 Jul 2025
https://github.com/huyduc1602/uniapp-crawler
Crawl và Dịch tài liệu Uni-app
Last synced: 25 Jan 2026
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 03 Feb 2026
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 14 Sep 2025
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 16 Mar 2025
https://github.com/zhima-mochi/wordpress-articles-list-generator
Auxiliary tool
Last synced: 14 Oct 2025
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 07 Jul 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 13 May 2026
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 19 Apr 2026
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 12 Apr 2026
https://github.com/bradsec/gomine
A Go CLI tool to quickly crawl and mine (download) specific file types from websites.
cli crawler golang terminal-based
Last synced: 09 Apr 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 22 Jun 2025
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 18 Mar 2025
https://github.com/suconghou/sitemap
a simple sitemap generator and page crawler
crawler html-parser nim-lang scraper sitemap spiders
Last synced: 15 May 2026
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 23 Mar 2025
https://github.com/patrik-fredon/python_wallpaper_crawler
Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.
crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv
Last synced: 14 Sep 2025
https://github.com/peterbencze/silene
Silene is an open source web crawler framework built upon Pyppeteer.
crawler framework pypp python scraper webcrawler
Last synced: 12 Jan 2026
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 30 Mar 2025
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 30 Mar 2025
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 28 Jun 2025
https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance
Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics
arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics
Last synced: 28 Apr 2026
https://github.com/montenegrodr/letmecrawl
Curated free proxies
crawler proxy proxy-server proxypool scraper
Last synced: 18 Jan 2026
https://github.com/eklem/vinmonopolet-crawler
Crawling Vinmonopolet-data and indexing it to a norch search index
crawler dataset javascript norch search-engine
Last synced: 26 Mar 2025
https://github.com/sgeisler/fishbones2epub
fetches the fishbones novel and outputs an epub
Last synced: 22 Mar 2025
https://github.com/surister/scrupy
Python library to create web Crawlers which aims to be powerful yet simple.
crawler crawling-framework crawling-python http library python scraping
Last synced: 15 May 2026
https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler
StackOverFlow Tag Generator Using a WebCrawler.
Last synced: 08 Apr 2025
https://github.com/instagram-automations/apify-instagram-scraper
apify instagram scraper data extraction tool
api apify apify-instagram-scraper automation bot crawler data-mining docker instagram nodejs playwright proxy python scraper social-media
Last synced: 14 Oct 2025
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 27 May 2026
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 30 Mar 2025
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 14 Apr 2026
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 16 Jun 2025
https://github.com/martincastroalvarez/web-to-pdf
Web crawlers using Python & Beautiful Soup
Last synced: 08 Apr 2025
https://github.com/boatraceventureproject/boatracescraper
The BVP Crawler package for Boatrace.
boatrace crawler php php-library php8
Last synced: 17 Mar 2025
https://github.com/ahsouza/iquizz-api
API RESTfull developed in Node.Js with MongoDB
animations cluster crawler docker docker-compose ejs-templates es8 font-awesome grunt-task helmet-detection heroku javascript jquery material-design mongodb nodejs passport-strategy passportjs pusher token-authetication
Last synced: 12 Apr 2026
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 07 Jun 2026
https://github.com/mizcausevic-dev/procurement-pulse-engine
The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.
ai-governance ai-procurement-pulse crawler data-journalism javascript kinetic-gain-protocol-suite procurement research well-known
Last synced: 01 Jun 2026
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 27 Jun 2025
https://github.com/licoy/win4000-images-crawler
基于scrapy爬取&下载win4000.com的图片壁纸
Last synced: 28 Mar 2025
https://github.com/mstephen19/apify-click-events
Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to
apify apify-sdk crawler scraper web-automation
Last synced: 23 Aug 2025
https://github.com/humbertodias/go-nie-crawler
Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.
Last synced: 03 Mar 2025
https://github.com/mustafadalga/website-crawler
Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.
crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling
Last synced: 20 Apr 2026
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 22 May 2026
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 15 Oct 2025
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 06 Jan 2026
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 15 Oct 2025
https://github.com/mizcausevic-dev/aeo-crawler
BFS crawler for AEO Protocol v0.1 declaration graphs. Seed an origin, follow primary_source URIs, emit JSON Lines records of every fetch. Built on aeo-sdk-go. Concurrent, depth-limited, budget-capped, stdlib-only HTTP.
aeo aeo-protocol ai-governance answer-engine-optimization crawler entity-graph go-cli golang kinetic-gain-protocol-suite protocol-implementation well-known
Last synced: 01 Jun 2026
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 20 May 2026
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 11 Nov 2025
https://github.com/filipsedivy/tachometer-check
🚘 MDČR - kontrola tachometru
Last synced: 11 Jan 2026
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 16 Oct 2025
https://github.com/kiranjisonawane143/blockchain-data-crawler
🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 06 May 2026
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 07 Jul 2025
https://github.com/foolishway/blog-crawler
blog-crawler crawl blogs by your configuration file.
Last synced: 22 Jan 2026
https://github.com/zfael/scrape-it-all
Modular web scraper for Node.JS
crawler scraper scraping scraping-websites web-scraping
Last synced: 04 Feb 2026
https://github.com/terminaldweller/crawley
A creepy crawler that runs as a sleepy daemon.
Last synced: 04 Jul 2025
https://github.com/zenixls2/2chpreprocess
Dump messages from 2ch with some preprocessing for ML analysis
Last synced: 26 Mar 2025
https://github.com/asmrcodez-yt/google-extensions-scraper
🚀 Download free and open-source Chrome extensions for web scraping! Extract data from various websites effortlessly with our latest .crx releases.
chrom codez crawler extension free linkedin omid opensource scraper thecodez web-scraper
Last synced: 17 Oct 2025
https://github.com/leshniak/robotstxt-debug
A tool for debugging robots.txt
crawler debugger indexing robots-txt seo seo-optimization seo-tools tester
Last synced: 25 Jun 2025
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 06 May 2026
https://github.com/thesurlydev/surly-spider
A command line interface for the spider library
crawl crawler rust spider surly surly-spider
Last synced: 16 Feb 2026
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 24 Jul 2025
https://github.com/tca166/ck2-history-extractor
A tool for creating an encyclopedia from your CK2 savefile
Last synced: 02 Apr 2025
https://github.com/billy0402/python-application
A learning project from the book 'Python 技術者們'.
course crawler matplotlib opencv pandas python requests selenium sklearn
Last synced: 12 Apr 2026
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 19 May 2026
https://github.com/manu-sh/http_normalizer
http url normalization for web crawlers
crawler http spider url-normalization
Last synced: 12 Jun 2025
https://github.com/mrrefactoring/types-supercrawler
Types for supercrawler nodejs lib
crawler crawlerjs nodejs supercrawler types typescript typescript-definitions
Last synced: 18 Apr 2026
https://github.com/alphadev3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 26 Dec 2025
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 11 Feb 2026
https://github.com/tigercosmos/web-crawler
Web Crawler in Java Maven Project
Last synced: 12 Jun 2025
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 12 Feb 2026
https://github.com/tormol/zenphoto-dl
A script for recursively downloading all pictures from zenphoto-based photo albums.
Last synced: 30 Aug 2025
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 22 May 2026
https://github.com/itechbear/robotstxt
A java clone of Google's robotst.txt parser: https://github.com/google/robotstxt
crawler google-robotst-parser java robotstxt
Last synced: 14 Jan 2026
https://github.com/microlinkhq/cloudflare-bot-directory
CloudFlare Radar verified bots directory – 500+ web crawlers and user agents as JSON.
bot-detection bots cloudflare cloudflare-radar crawler crawlers dataset datasets googlebot user-agent user-agents user-agents- verified-bots web-crawler web-scraping
Last synced: 20 Apr 2026
https://github.com/vuchkov/forbes-billionairs-list
Forbes Billionairs List Crawler - PHP, MySQL, Headless browser, etc.
crawler headless-chrome php scraper website
Last synced: 29 Apr 2026
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 06 May 2026
https://github.com/basemax/okala-database-crawler
A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.
crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper
Last synced: 01 May 2026
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
Last synced: 31 Mar 2025