An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/mascanho/rustyseo

SEO/GEO toolkit to analyse, crawl, parse and optimise websites & logs (Nginx & Apache)

ai crawler cwv geo google marketing rust seo spider tauri

Last synced: 15 Feb 2026

https://github.com/go-crawler/douban-movie

Golang爬虫 爬取豆瓣电影Top250

crawler douban go golang movie spider

Last synced: 16 Jan 2026

https://github.com/tijme/not-your-average-web-crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability

Last synced: 06 Apr 2025

https://github.com/abaykan/CrawlBox

Easy way to brute-force web directory.

admin-finder crawler python web-crawler wordlist

Last synced: 26 Mar 2025

https://github.com/jin10086/pachong

一些爬虫的代码

crawler python2

Last synced: 30 Oct 2025

https://github.com/Liu233w/acm-statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue

Last synced: 11 Apr 2025

https://github.com/liu233w/acm-statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue

Last synced: 04 Apr 2025

https://github.com/karthikuj/sasori

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

automation crawler crawling dast dynamic endpoint-discovery infosec puppeteer scraping security

Last synced: 15 Aug 2025

https://github.com/luohaha/jlitespider

A lite distributed Java spider framework :-)

crawler distributed distributed-systems rabbitmq spider

Last synced: 21 Jul 2025

https://github.com/bartdag/pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.

crawler link-checker networking python

Last synced: 07 Apr 2025

https://github.com/egoist/taki

Take a snapshot of any website.

crawler prerender snapshot

Last synced: 09 Apr 2025

https://github.com/twiny/spidy

Domain names collector - Crawl websites and collect domain names along with their availability status.

backlinks crawler domain expired-domain golang scraper seotools spider

Last synced: 17 Aug 2025

https://github.com/janreges/siteone-crawler

SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).

analyzer crawler crawling performance qa quality-assessment security seo seotools stress-testing swoole testing website

Last synced: 18 Mar 2026

https://github.com/moranzcw/Zhihu-Spider

一个获取知乎用户主页信息的多线程Python爬虫程序。

crawler jupyter-notebook matplotlib python requests zhihu-spider

Last synced: 28 Mar 2025

https://github.com/algolia/npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

algolia couchdb crawler npm search sync yarn

Last synced: 19 Jun 2025

https://github.com/TGiles/auto-lighthouse

A utility package for automating lighthouse reporting

audits auto-lighthouse crawler lighthouse-reports robots simplecrawler

Last synced: 06 Apr 2025

https://github.com/tgiles/auto-lighthouse

A utility package for automating lighthouse reporting

audits auto-lighthouse crawler lighthouse-reports robots simplecrawler

Last synced: 06 Apr 2025

https://github.com/alex-on-ai/WebReaper

AI-native web scraper. Single binary with a bundled Claude Code skill. MIT-licensed alternative to Firecrawl.

ai-agents-automation claude-code crawler dotnet firecrawl-alternative llm markdown mcp parser parsing scraper scraping scraping-api scraping-web scraping-websites webcrawler webscraping

Last synced: 14 Jun 2026

https://github.com/scraperai/scraperai

ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.

crawler langchain linkedin openai parser parsing python requests scraper scraping selenium

Last synced: 10 Apr 2025

https://github.com/roys/cewler

CeWLeR - Custom Word List generator Redefined. CeWL alternative in Python, based on the Scrapy framework.

bugbounty crawler reconnaissance spider

Last synced: 05 Apr 2026

https://github.com/jakepartusch/lumberjack

An automated website accessibility scanner and cli

a11y accessibility axe cli crawler lumberjack

Last synced: 10 Sep 2025

https://github.com/JakePartusch/lumberjack

An automated website accessibility scanner and cli

a11y accessibility axe cli crawler lumberjack

Last synced: 12 May 2025

https://github.com/hominee/dyer

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

crawler rust rust-programming-language spider web-crawler web-framework web-scraping

Last synced: 11 Mar 2026

https://github.com/lincanbin/sina-weibo-album-downloader

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

crawler python weibo

Last synced: 10 Sep 2025

https://github.com/alash3al/scraply

Scraply a simple dom scraper to fetch information from any html based website

crawler crawling dom golang scraper scrapers scraping-websites scrapy server

Last synced: 28 Apr 2025

https://github.com/duckduckgo/tracker-radar-collector

🕸 Modular, multithreaded, puppeteer-based crawler

crawler puppeteer tracker-radar

Last synced: 20 Aug 2025

https://github.com/nasa-jpl-memex/memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

ache anaconda apache crawler dashboard domain-discovery memex-explorer miniconda nutch tika

Last synced: 10 Mar 2026

https://github.com/storyicon/graphquery

GraphQuery is a query language and execution engine tied to any backend service.

crawler css graph html jsonpath query regexp sql xml xpath

Last synced: 06 Apr 2025

https://github.com/WuLC/GoogleImagesDownloader

Enlarge training dataset by searching images with specified keywords in google and download the presented images

crawler google image keyword selenium

Last synced: 12 Apr 2025

https://github.com/lin-jun-xiang/agent-line-bot

🤖Free Agent Line Bot with Google Image Search, Image Generator, Video Generator...

agent chatbot chatgpt crawler gpt linebot llm vlm zhipuai

Last synced: 28 Jul 2025

https://github.com/ethereum/node-crawler

Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.

crawler ethereum

Last synced: 13 Apr 2025

https://github.com/wx-chevalier/sentinel-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫

crawler etl koa2 monitor nodejs react wx-code

Last synced: 22 Aug 2025

https://github.com/wxyyxc1992/xe-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫

crawler etl koa2 monitor nodejs react wx-code

Last synced: 23 Mar 2025

https://github.com/glouw/andvaranaut

A dungeon crawler

crawl crawler dungeon

Last synced: 25 Apr 2025

https://github.com/duyet/pricetrack

Price tracker monitors of products and alerts you when prices drop. Supported tiki.vn, shopee, lotte.vn, ... Built with firebase https://pricetrack.web.app

api crawler cronjob-scheduler firebase firebase-auth firebase-functions firebase-hosting firestore redash shopee shopee-api tiki tracking

Last synced: 07 Jul 2025

https://github.com/pavlovtech/WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping

Last synced: 08 Apr 2025

https://github.com/jxpro/damai-tickets

大麦抢票脚本案例

crawler python selenium

Last synced: 15 Mar 2025

https://github.com/mazzzystar/baiducrawler

Sample of using proxies to crawl baidu search results.

baidu crawler proxies proxy

Last synced: 04 Oct 2025

https://github.com/wwwwwydev/crawlist

A universal solution for web crawling lists. 抓取网页列表的通用解决方案

crawl crawler crawler-python crawling-python crawlist python reptile

Last synced: 01 May 2025

https://github.com/archiveteam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 04 Apr 2025

https://github.com/pinkpixel-dev/web-scout-mcp

A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.

ai-assistant ai-tools cheerio content-extraction crawler duckduckgo duckduckgo-search google-search mcp mcp-server web-content web-crawler web-scraper web-scraping web-search web-search-agent

Last synced: 06 Mar 2026

https://github.com/hardikvasa/webb

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!

crawl-pages crawler python-library

Last synced: 07 Apr 2025

https://github.com/jackluson/convertible-bond-crawler

宁稳网(旧富投网)、集思录可转债数据&策略分析

convertible-bond crawler

Last synced: 18 Jan 2026

https://github.com/SeaQL/starfish-ql

✴️ An experimental graph database

crates-io crawler database graph hacktoberfest network rust sql visualization

Last synced: 27 Apr 2025

https://github.com/schollz/linkcrawler

Cross-platform persistent and distributed web crawler :link:

crawler hyperlinks web

Last synced: 22 Apr 2025

https://github.com/zytedata/zyte-smartproxy-headless-proxy

A complimentary proxy to help to use SPM with headless browsers

crawler proxy scraping

Last synced: 28 Apr 2025

https://github.com/lixi5338619/asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

aiohttp asyncio asyncpy crawler python scrapy

Last synced: 07 May 2025

https://github.com/brantou/crawler

爬虫, http代理, 模拟登陆!

crawler python scrapy

Last synced: 06 May 2025

https://github.com/kamiyomu/kamiyomu

A self-hosted, extensible manga reader and download tool with plug-in support.

crawler crawler-agents csharp dotnet downloader kamiyomu kavita konga manga manga-downloader manga-scraper

Last synced: 15 Apr 2026

https://github.com/aminehorseman/images-web-crawler

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..

crawler dataset dataset-creation flickr-api google-images-crawler google-images-downloader image-classification image-dataset image-processing images machine-learning

Last synced: 07 Oct 2025

https://github.com/wuchunfu/ipproxypool

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra

crawler go ip proxy proxy-server proxypool

Last synced: 21 Aug 2025

https://github.com/patrickschur/pappet

A command-line tool to crawl websites using puppeteer.

cli crawler pdf puppeteer screenshot

Last synced: 25 Aug 2025

https://github.com/foolin/pagser

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

colly crawler deserialization go golang goquery html page parser scrapy

Last synced: 22 Apr 2025

https://github.com/ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 18 Jul 2025

https://github.com/kostas-pa/LFITester

LFITester is a Python3 program that automates the detection and exploitation of Local File Inclusion (LFI) vulnerabilities on a server.

bugbounty crawler cybersecurity enumeration exploitation fuzzing hacking lfi lfi-detection lfi-exploitation lfi-vulnerability penetration-testing penetration-testing-tools pentest-tool pentesting python web-hacking webhacking

Last synced: 12 Jul 2025

https://github.com/hueristiq/xcrawl3r

A command-line interface (CLI) based utility to recursively crawl webpages. It is designed to systematically browse webpages' URLs and follow links to discover linked webpages' URLs.

bug-bounty bug-bounty-tools contentdiscovery crawler ethical-hacking ethical-hacking-tools go golang penetration-testing penetration-testing-tools reconnaissance red-teaming red-teaming-tools web-security

Last synced: 06 Apr 2025

https://github.com/foo-git/rewe-discounts

Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste

api crawler python rewe

Last synced: 04 Sep 2025

https://github.com/medcl/gopa-abandoned

GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )

crawler golang lightweight spider

Last synced: 14 Jan 2026

https://github.com/creekorful/bathyscaphe

Fast, highly configurable, cloud native dark web crawler.

architecture crawler crawling elasticsearch golang hidden-services kibana tor web-crawler

Last synced: 17 Mar 2025

https://github.com/samber/the-great-gpt-firewall

🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs

agent anthropic blocklist censorship crawler firewall genai generative-ai gpt gpt-4 llm openai robots-txt user-agent

Last synced: 17 Aug 2025

https://github.com/jefferyhus/es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

bots crawler detection es6-javascript spider

Last synced: 16 May 2025

https://github.com/nietaki/crawlie

A simple Elixir library for writing decently-performing crawlers with minimum effort.

crawler elixir elixir-library genstage

Last synced: 24 Aug 2025

https://github.com/JefferyHus/es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

bots crawler detection es6-javascript spider

Last synced: 29 Apr 2025

https://github.com/alexfazio/devdocs-to-llm

Turn any developer documentation into a GPT

crawler crawling firecrawl scraper scraping

Last synced: 08 Mar 2026

https://github.com/krau/manyacg

Collect, Download, Organize and Share your Favorite Anime Artworks.

anime crawler danbooru image-viewer kawaii nhentai picture pixiv telegram telegram-bot waifu

Last synced: 17 Apr 2026

https://github.com/Randark-JMT/Bilibili_manga_download

带图形界面的哔哩哔哩漫画下载工具

bilibili crawler downloader pyside6 python python3 qt spider

Last synced: 16 Mar 2025

https://github.com/randark-jmt/bilibili_manga_download

带图形界面的哔哩哔哩漫画下载工具

bilibili crawler downloader pyside6 python python3 qt spider

Last synced: 09 Jul 2025

https://github.com/boris-code/feaplat

爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

crawler feapder feaplat spider

Last synced: 13 Apr 2025

https://github.com/ondrejsojka/instastories-backup

Backup your friends' Instagram Stories forever and get to keep them even after 24 hours.

backup crawler instagram instagram-stories python python-3-6 python3

Last synced: 14 Sep 2025

https://github.com/crawlab-team/webspot

An intelligent web service to automatically detect web content and extract information from it.

crawlab crawler spider web

Last synced: 11 May 2025

https://github.com/fedebotu/iclr2023-openreviewdata

Crawl & Visualize ICLR 2023 Data from OpenReview

crawler dataset iclr iclr2023 openreview peer-review review scraper

Last synced: 05 Oct 2025

https://github.com/roccomuso/is-google

Verify that a request is from Google crawlers using Google's DNS verification steps

bot check crawler dns google ip js nodejs verify

Last synced: 14 Sep 2025

https://github.com/LexiestLeszek/scrapeGPT

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 07 Apr 2025

https://github.com/lexiestleszek/scrapegpt

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 11 Mar 2026

https://github.com/zongdeiqianxing/WebSecurityArticles

爬取及整理Freebuf\安全客\先知\知道创宇等站点的”web安全“类优质文章

anquanke articles crawl crawler freebuf leavesongs security seebug vulhub web xianzhi

Last synced: 28 Sep 2025

https://github.com/xiaomingx/proxy-pool

Python ProxyPool for web spider

crawler poc proxy rce spider tool

Last synced: 09 Apr 2025

https://github.com/kcubeterm/achoz

Search through all your personal data efficiently like web search.

crawler document-search filesearch search-engine websearch

Last synced: 21 Aug 2025

https://github.com/da2vin/fetchman

fetchman is a simple crawler system/简单好用的爬虫框架

crawler framework python

Last synced: 12 Mar 2026

https://github.com/feiskyer/scrapy-examples

Some scrapy and web.py exmaples

crawler python scrapy

Last synced: 09 Oct 2025

https://github.com/lrlna/puppeteer-walker

a puppeteer walker 🕷 🕸

chrome crawler headless puppeteer spider walker

Last synced: 11 Sep 2025

https://github.com/crawlzone/crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

automated-testing crawler crawling-framework middleware php web-scraping web-search

Last synced: 11 Jan 2026