Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-02 00:06:49 UTC
- JSON Representation
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 01 Apr 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 10 Apr 2025
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 10 Apr 2025
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 13 Apr 2026
https://github.com/mehdieidi/offliner
Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.
concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread
Last synced: 14 Jan 2026
https://github.com/heitor57/astronomy-news
:telescope::newspaper: Astronomy News
crawler data-science news text-mining
Last synced: 06 Oct 2025
https://github.com/b3j4y/unidisk
A Crawler to search for keywords and compare the score
comparison crawler nlp solr-client
Last synced: 17 Jan 2026
https://github.com/semoal/pythoncrawler
Python crawler with XMLRPC & BeautifulSoap
beautifulsoup crawler python wordpress xmlrpc
Last synced: 15 Apr 2026
https://github.com/heyihuang826/ncku_course
Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).
Last synced: 01 Mar 2026
https://github.com/nyarla/net-paranoid-go
(WIP) A paranoidic helpers for untrusted web content crawler
crawler filtering golang helper
Last synced: 14 Jan 2026
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 08 Oct 2025
https://github.com/romangw/lukki
Completely free code for a webcrawling bot.
crawler python web-scraping web-scraping-python
Last synced: 08 Oct 2025
https://github.com/killianmeersman/wander
Convenient scraping library for Gophers
crawler data-mining golang scraper spider
Last synced: 14 Jan 2026
https://github.com/bernieyangmh/check-link
Checking through whole website, identifying broken links.
Last synced: 14 Jan 2026
https://github.com/kyungw00k/stealth-wright
Silent browser automation CLI with stealth capabilities
crawler go playwright stealth-automation
Last synced: 31 May 2026
https://github.com/daitangio/find
Python + SQLite search engine
crawler indexer python search-engine
Last synced: 18 Jan 2026
https://github.com/panagiotisptr/codeforces-companion
A codeforces parser, code tester and testcase generator in Go
codeforces-parser competitions crawler go golang parser test-automation testing
Last synced: 14 Jan 2026
https://github.com/namchee/hackerbits
Web Crawler dan Clustering pada website HackerNews.
Last synced: 09 Oct 2025
https://github.com/dappsar/ethglobal-crawler
A web crawler that scrapes and aggregates projects from ETHGlobal hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.
Last synced: 09 Oct 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 09 Oct 2025
https://github.com/zrquan/gatherer
Gatherer 是一个简易的爬虫工具
crawler infosec pentest security
Last synced: 14 Jan 2026
https://github.com/ninja-yubaraj/lootbin
A tool to hunt, scan, and loot public pastes from Termbin for interesting keywords.
crawler monitoring osint osint-python osint-tool pastebin python python3 scanner scraper termbin
Last synced: 11 Oct 2025
https://github.com/andreposman/magic-number
A CLI Tool/API to calculate the passive income in FII's
Last synced: 14 Jan 2026
https://github.com/katronquillo/grimm
Simple search engine for the Brothers Grimm Fairy Tales
Last synced: 24 Apr 2026
https://github.com/ignmaro/new
The "new" project introduces a streamlined approach to task management, focusing on simplicity and efficiency. It allows users to create, organize, and track their tasks with minimal setup and maximum clarity.
bandcamp brook crawler ios jobs newgrad news rss rss-reader soundcloud v2ray video vmess vuejs3
Last synced: 13 Oct 2025
https://github.com/zhima-mochi/wordpress-articles-list-generator
Auxiliary tool
Last synced: 14 Oct 2025
https://github.com/instagram-automations/apify-instagram-scraper
apify instagram scraper data extraction tool
api apify apify-instagram-scraper automation bot crawler data-mining docker instagram nodejs playwright proxy python scraper social-media
Last synced: 14 Oct 2025
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 14 Apr 2026
https://github.com/mizcausevic-dev/procurement-pulse-engine
The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.
ai-governance ai-procurement-pulse crawler data-journalism javascript kinetic-gain-protocol-suite procurement research well-known
Last synced: 01 Jun 2026
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 15 Oct 2025
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 15 Oct 2025
https://github.com/mizcausevic-dev/aeo-crawler
BFS crawler for AEO Protocol v0.1 declaration graphs. Seed an origin, follow primary_source URIs, emit JSON Lines records of every fetch. Built on aeo-sdk-go. Concurrent, depth-limited, budget-capped, stdlib-only HTTP.
aeo aeo-protocol ai-governance answer-engine-optimization crawler entity-graph go-cli golang kinetic-gain-protocol-suite protocol-implementation well-known
Last synced: 01 Jun 2026
https://github.com/stephanebruckert/gocrawl
Crawl every pages and assets of a web domain
Last synced: 16 Oct 2025
https://github.com/foolishway/blog-crawler
blog-crawler crawl blogs by your configuration file.
Last synced: 22 Jan 2026
https://github.com/asmrcodez-yt/google-extensions-scraper
🚀 Download free and open-source Chrome extensions for web scraping! Extract data from various websites effortlessly with our latest .crx releases.
chrom codez crawler extension free linkedin omid opensource scraper thecodez web-scraper
Last synced: 17 Oct 2025
https://github.com/danielemoraschi/sitemap-app
Sitemap generator command line application using dmoraschi/sitemap-common library
crawler php php-library sitemap sitemap-generator
Last synced: 19 Oct 2025
https://github.com/bersegosx/exparic
Web parser via yaml config
crawler parser yaml-configuration
Last synced: 21 Oct 2025
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 01 Jun 2026
https://github.com/guillempuche/news_crawler
Scrape news from Olot town hall (https://www.olot.cat) with TypeScript and Crawlee. Collects summaries and full articles, stored in separate datasets.
biomejs crawlee crawler news-crawler olot townhall yarn-berry
Last synced: 23 Oct 2025
https://github.com/obsidianplusplus/tensorrt-python-api-crawler
用于抓取 NVIDIA TensorRT Python API 文档并转换为 Markdown 格式的 Python 爬虫 | Python crawler for scraping NVIDIA TensorRT Python API documentation and converting it to Markdown format.
api base converter crawler deep docs documentation gpt knowledge learning llm markdown nvidia offline python scraper scraping tensorrt web
Last synced: 14 May 2026
https://github.com/rutopio/crawler-2020-taiwanese-election-results
2020 台灣選舉結果爬蟲:以不分區政黨票為例
Last synced: 24 Oct 2025
https://github.com/xatier/metart-streamlit
Metart network viewer with streamlit 💦🍑💡
crawler streamlit streamlit-webapp
Last synced: 23 Jan 2026
https://github.com/ashwantmanikoth/IntellilSearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 25 Oct 2025
https://github.com/0xh3xa/benign-crawler
Crawler for downloading benign files from FileHippo and other sources
benign crawler datasets downloader malware-research
Last synced: 26 Oct 2025
https://github.com/recepkizilarslan/console-tourist
Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.
console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools
Last synced: 24 Feb 2026
https://github.com/chamzzzzzz/supersimplesoup
a go package implements a super simple soup like DOM API
beatifulsoup crawler crawler-go dom go golang html-parser
Last synced: 28 Jan 2026
https://github.com/gn00678465/crawler
使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。
Last synced: 06 Feb 2026
https://github.com/danielfillol/ab2l_crawler
Crawler for AB2L radar
brazil crawler lawtech legaltech
Last synced: 28 Jan 2026
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 20 Jun 2026
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Apr 2026
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 29 Jan 2026
https://github.com/atasoglu/websense
A modular AI-powered web scraper for data pipelines.
ai automation crawler data-extraction llm parsing scraper structured-output web-scraping
Last synced: 31 Jan 2026
https://github.com/intina47/ee_error
implementation of a web crawler using c++
cpp crawler curl gumbo libcurl stanford-nlp web
Last synced: 31 Jan 2026
https://github.com/xiangronglin/novel2go
Android app to create pdf from website and send to your kindle
android crawler jetpack kotlin pdf-generation readability
Last synced: 31 Jan 2026
https://github.com/ashwantmanikoth/intellilsearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 15 Apr 2026
https://github.com/lucasromualdo/glassdoorcrawler
Crawler em Python para coletar vagas do Glassdoor e exportar para Excel
cli crawler glassdoor openpyxl pandas python web-scraping
Last synced: 25 Feb 2026
https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb
Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.
crawler fastapi google-scholar mongodb python selenium
Last synced: 16 Apr 2026
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 11 Feb 2026
https://github.com/basemax/github-repos-report-generator
A Python CLI tool to fetch all public repositories of a GitHub user, extracting repository details such as name, URL, description, top language, and tags. Outputs data in CSV, JSON, and HTML formats.
api api-github crawler csv export extract github github-api github-export github-exporter github-info html json py python
Last synced: 16 Apr 2026
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 12 Feb 2026
https://github.com/mt4110/postal_converter_ja
High-performance Japanese Postal Code Converter & API. Auto-updating, DB-agnostic (MySQL/PostgreSQL), written in Rust & Next.js.日本郵便局のデータを自動更新機能付き、Rustの非同期クローリングシステム。最加速で最新の郵便番号データの更新化がされます。
api crawler docker mysql nextjs nix postgresql react rust
Last synced: 13 Feb 2026
https://github.com/yggverse/pulsarss
RSS Aggregator for Gemini Protocol
aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust
Last synced: 13 Feb 2026
https://github.com/shivamsaraswat/webxcrawler
WebXCrawler is a fast static crawler to crawl a website and get all the links.
crawler crawling python scraping webcrawler webxcrawler
Last synced: 13 Feb 2026
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 15 Feb 2026
https://github.com/faridfr/dribbble-crawler-php
Dribbble crawler with PHP
crawler dribbble dribbble-crawler php php-crawler user-interface
Last synced: 17 Mar 2025
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 15 Jun 2026
https://github.com/nabi-allenby/web-crawler
BFS web crawler
crawler docker k8s kubernetes reconnaissance rust rust-lang webcrawler
Last synced: 02 Mar 2026
https://github.com/igorbrizack/crawler-web
Aplicação de coleta de dados Web com ReactJS e Python - API Rest
beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper
Last synced: 16 Apr 2026
https://github.com/nsalvacao/cli-plugins
OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge
ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python
Last synced: 04 Mar 2026
https://github.com/metehan777/http-header-link-graph
Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.
aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo
Last synced: 03 Jun 2026
https://github.com/olostep-api/olostep-cli
CLI for the Olostep API — scrape, map, crawl, answer, batch the web from your terminal. Pure JS rewrite of olostep-cli.
ai-agents cli crawler mcp nodejs npm olostep scraping typescript web-scraping
Last synced: 03 Jun 2026
https://github.com/marshallvoid/affiliate-chrome-extension
chrome-extension crawler tiktok
Last synced: 29 Apr 2026
https://github.com/rodrigorvsn/ace
🔥 Receiving an email of hottest promotions every day
crawler cronjob nextjs prisma puppeteer react-email resend
Last synced: 17 Apr 2026
https://github.com/bennettdams/vace-it-crawler
Python (Scrapy) crawler to access data of FACEIT.com
Last synced: 03 Jun 2026
https://github.com/moonyfringers/ladon
crawler data-pipeline ladon ladon-framework llm python training-data web-crawler web-scraping
Last synced: 17 Apr 2026
https://github.com/lig8t555/ecommerce
MERN Stack Ecommerce Store | Running In Production | MVP
baidu-tieba baotu bootstrap crawler douban-music ecommerce-platform fofa mongoose quanjing redux shopping-cart shopping-cart-solution stripe taobao-spider
Last synced: 04 Apr 2026
https://github.com/triekai/review-radar
An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.
crawler firebase gemini google-maps nextjs openai pwa react
Last synced: 04 Apr 2026
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 18 Apr 2026
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 19 Apr 2026
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 19 Apr 2026
https://github.com/gesiscss/github_traffic_crawler
Retrieve the data information from the repositories (insight, usage, commits)
Last synced: 20 Apr 2026
https://github.com/kernelerr/pixivurls
An awesome tool to get Pixiv image URLs.
Last synced: 20 Apr 2026
https://github.com/ravenastar-js/ravpagelinks
🚀 RavPageLinks 🕷️ Ferramenta básica de Enumeração de URLs em Páginas Web
axios chalk crawler links playwright ravenastar scraping url-enumeration
Last synced: 20 Apr 2026
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 21 Apr 2026