An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/jannchie/simpyder

超高速异步协程Python爬虫

crawler python spider

Last synced: 16 Jun 2025

https://github.com/melroy89/metacritic_api

PHP Metacritic API - Mirror from my GitLab

api crawler data metacritic parser php scores scraper webscraping

Last synced: 13 May 2025

https://github.com/tzw0745/tumblr-crawler-cli

Tumblr Download Tool with High Speed and Customization. 高性能&高定制化的Tumblr下载工具。

cli-app crawler python tumblr tumblr-downloader

Last synced: 13 Jul 2025

https://github.com/flickz/newspaperjs

News extraction and scraping. Article Parsing

crawler news news-aggregator nodejs scraper webcrawling webscraping

Last synced: 02 Jun 2026

https://github.com/go-crawler/car-prices

Golang爬虫 爬取汽车之家 二手车产品库

crawler go golang spider

Last synced: 14 Jan 2026

https://github.com/minicloudsky/eastmoney

python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund and stock data,for data analysis and visualiaztion .

crawler database django eastmoney financial-analysis financial-data metabase mysql nodejs python vue vuejs

Last synced: 10 Jul 2025

https://github.com/lucasayres/python-tools

A collection of Python tools, scripts and utilities to make your life easier.

automation codes collection crawler functions geolocation helper libs pdf python qrcode recipes scripts speech sqlalchemy tips tools tricks unzip utilities

Last synced: 16 May 2025

https://github.com/drkostas/jobapplicationbot

A bot that automatically sends emails to new ads posted in any desired xe.gr search url.

bot crawler email-sender python scraper

Last synced: 23 Sep 2025

https://github.com/zhang2333/light-crawler

a simplified directed customizable website crawler

crawler node-js

Last synced: 06 Sep 2025

https://github.com/usernam3/shopify-app-store-scraper

Crawler behind the Shopify App Marketplace dataset

crawler dataset-creation shopify

Last synced: 07 Apr 2025

https://github.com/python-testing-crawler/python-testing-crawler

A crawler for automated functional testing of a web application

crawler django flask python testing

Last synced: 05 Aug 2025

https://github.com/jhao104/spider

python crawler spider

crawler python spider

Last synced: 22 Mar 2025

https://github.com/mzollin/qr-pirate

crawl QR-codes from search engines and look for bitcoin private keys

bitcoin bitcoin-wallet crawler cryptocurrency private-key python qr-code qrcode qrcode-reader

Last synced: 28 Oct 2025

https://github.com/us/crw

Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.

ai ai-agents crawler data-extraction docker firecrawl firecrawl-alternative html-to-markdown llm markdown mcp mcp-server rust scraping-api self-hosted tavily-alternative web-crawler web-scraper web-scraping web-search-api

Last synced: 09 May 2026

https://github.com/nekolr/slime

🍰 A visual crawler management platform

crawler spider visual-crawler websocket

Last synced: 16 May 2025

https://github.com/trudi-group/ipfs-crawler

A crawler for the IPFS network, code for our paper (https://arxiv.org/abs/2002.07747). Also holds scripts to evaluate the obtained data and make similar plots as in the paper.

crawler ipfs ipfs-network kademlia-dht libp2p

Last synced: 12 Jun 2025

https://github.com/muhac/chinese-holidays-calendar

Calendar of Public Holidays in China 中国大陆节假日日历订阅 自动节假日闹钟

automation calendar chinese-holidays crawler events ics-files

Last synced: 09 May 2025

https://github.com/saltyshiomix/nest-crawler

An easiest crawling and scraping module for NestJS

crawler nestjs nodejs scraper typescript

Last synced: 16 Mar 2025

https://github.com/nightmarcher/zhihu-crawler

徒手实现定时爬取知乎,从中发掘有价值的信息,并可视化爬取的数据作网页展示。

crawler developing mongodb pipenv python3 redis selenium spider zhihu

Last synced: 26 Jun 2025

https://github.com/howie6879/hproxy

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

asyncio crawler crawlers hproxy proxy proxy-pool proxy-spider sanic schedule

Last synced: 16 May 2025

https://github.com/hijkzzz/dht-crawler

A DHT Crawler based on Goroutine

crawler dht golang

Last synced: 22 Jun 2025

https://github.com/wenyalintw/google-patents-scraper

Automatically download all PDF files of searching results & their patent families found on Google Patents.

crawler google-patents patent patents pdf scraper scraping scrapy web-scraping

Last synced: 03 Mar 2026

https://github.com/webcoding/js_block

研究学习各种拦截:反爬虫、拦截ad、防广告注入、斗黄牛等

block-ad block-res block-spider crawler nodejs spider

Last synced: 23 Jan 2026

https://github.com/Lin-jun-xiang/agent-line-bot

🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...

chatbot chatgpt craw crawler cron gpt gpt-3 gpt4free linebot replit scraper

Last synced: 21 Aug 2025

https://github.com/vifreefly/rubium

Antidetect Headless Chrome Browser for Ruby Web Scraping and Automation

antidetect-browser automation capybara chromium crawler headless playwright puppeteer ruby scraping web-scraping

Last synced: 12 Feb 2026

https://github.com/aziz0x48/xsmtp

xSMTP 🦟 Lightning fast, multithreaded smtp scanner targeting open-relay and unsecured servers in multiple network ranges.

bot crawler exploit exploit-scanner multithreading networking pentest-tool pentesting pentesting-tools portscan portscanner python python-exploits scanner-web security security-tools smtp smtp-cracker

Last synced: 16 Aug 2025

https://github.com/mirusu400/pinterest-infinite-crawler

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

crawler hacktoberfest pinterest pinterest-downloader python scraper scraping selenium

Last synced: 11 May 2026

https://github.com/absingh31/tor_spider

Python project to crawl and scrap the lesser known deep web or one can say dark web. Just provide the onion link and get started.

crawler file-manager ioc python3 scraper scraping socks stem tor tor-config tor-spider

Last synced: 11 May 2025

https://github.com/schollz/crawdad

Cross-platform persistent and distributed web crawler :crab:

crawler golang redis web

Last synced: 22 Apr 2025

https://github.com/cho45/chemrtron

A document viewer; fuzzy match incremental search.

crawler document-viewer electron increment javascript

Last synced: 30 Dec 2025

https://github.com/cheezone/zhihuvapi

优雅地玩知乎

crawler python zhihu

Last synced: 29 Jul 2025

https://github.com/koshort/koshort

(deprecated) :cat: koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.

crawler korean nlp python streaming text-mining

Last synced: 19 Nov 2025

https://github.com/dannyben/snapcrawl

Crawl a website and take screenshots

capture crawler gem ruby screenshot

Last synced: 04 Apr 2025

https://github.com/x-way/crawlerdetect

Golang module to detect bots and crawlers via the user agent

bot-detection crawler crawler-detection detect go spider user-agent

Last synced: 09 Apr 2025

https://github.com/johanneszab/tumbltwo

TumblTwo, an Improved Fork of TumblOne, a Tumblr Downloader.

crawler downloader photos ripper tumblr tumblr-blog tumblr-downloader videos

Last synced: 09 Mar 2026

https://github.com/harborzeng/crawler_jd_what_worthy_buying

爬取京东商品所有评论,利用情感分析,判断商品是否值得买

crawler jingdong-cart

Last synced: 24 Apr 2025

https://github.com/niespodd/webrtc-local-ip-leak

Oh no, stop this. You can see my local IP address 😲! Use `foundation` attribute against CRC32 lookup table to reveal local IP address of a Chrome/Chromium visitor.

automation bot bot-detection crawler spider stealth webrtc

Last synced: 27 Aug 2025

https://github.com/findopendata/findopendata

A search engine for Open Data

crawler dataset-search opendata

Last synced: 14 Jan 2026

https://github.com/fengzhizi715/piccrawler

使用RxJava2 和 Java 8的特性开发的图片爬虫

crawler java-8 parallel rxjava2

Last synced: 30 Oct 2025

https://github.com/sshwy/pku3b

🎓a Better BlackBoard for PKUers. 北京大学教学网命令行工具(🖥️Win/🐧Linux/🍏Mac), 支持查看/提交作业、下载课程回放.

blackboard-learn cli command-line-tool crawler m3u8 peking-university pku rust

Last synced: 30 Jan 2026

https://github.com/beomi/simple_bank_korea

simple crawler for Korean banks with Transactions

bank crawler

Last synced: 07 May 2025

https://github.com/hfreire/browser-as-a-service

A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML

browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler

Last synced: 11 Sep 2025

https://github.com/lobehub/chat-plugin-web-crawler

🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.

ai chatgpt crawler function-calling lobe-chat lobe-chat-plugin openai

Last synced: 29 Mar 2025

https://github.com/howie6879/talospider

talospider - A simple,lightweight scraping micro-framework

crawler crawling python spider web-spider

Last synced: 25 Oct 2025

https://github.com/nicholaskajoh/devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

crawler flask mongodb pagerank python scrapy search search-engine spider tf-idf

Last synced: 16 Jan 2026

https://github.com/roccomuso/price-monitoring

Node.js price monitoring library, leveraging the power of x-ray and nightmare.

alert comparison crawler javascript monitoring nodejs price-tracker

Last synced: 14 Sep 2025

https://github.com/kabegame/kabegame

Kabegame — An anime image crawler client with pluggable crawlers (from a GitHub plugin repo), wallpaper rotation by custom rules, and Wallpaper Engine export. Supports Windows 10/11, macOS Big Sur+, and Ubuntu 24.04+.

android anime crawler linux macos no-electron open-source otaku tauri vue wallpaper windows

Last synced: 24 Apr 2026

https://github.com/forsti0506/a11y-sitechecker

Automatic accessibility checker with website crawling + screenshots for easy use

accessibility accessibility-criteria accessibility-testing axe crawler hacktoberfest open-source puppeteer typescript typescript-library

Last synced: 13 Jul 2025

https://github.com/jaymon/wishlist

Read an Amazon wishlist programmatically with Python

amazon amazon-wishlist api crawler python scraper

Last synced: 27 Oct 2025

https://github.com/eliashaeussler/cache-warmup

🔥 PHP library to warm up caches of URLs located in XML sitemaps

cache-warmup crawler php xml-sitemap

Last synced: 04 Apr 2025

https://github.com/a11ywatch/crawler

gRPC web crawler turbo charged for performance

a11ywatch crawler grpc scraper

Last synced: 16 Oct 2025

https://github.com/sachaarbonel/scrapy.dart

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

crawler dart scrapy

Last synced: 12 Mar 2026

https://github.com/mariot/chan-downloader

CLI to download all images/webms in a 4chan thread

4chan 4chan-downloader crawler scraper

Last synced: 13 Aug 2025

https://github.com/h12w/html-query

A fluent and functional approach to querying HTML

crawler dom go golang golang-package html parser

Last synced: 26 Jan 2026

https://github.com/farishijazi/rarbgcli

RARBG command line interface for scraping the rarbg.to torrent search engine

crawler rarbg rarbg-torrentapi torrent torrents torrents-crawler

Last synced: 17 Mar 2025

https://github.com/mouday/pageparser

网页解析器,用于网络爬虫解析页面, 不懂网页解析也能写爬虫

crawler parser python spider

Last synced: 13 Apr 2025

https://github.com/lschmelzeisen/nasty

NASTY Advanced Search Tweet Yielder

crawler python twitter

Last synced: 10 Mar 2026

https://github.com/fritzh321/logo-scrape

🕷🚀 Scrapes/Crawls the logo from a provided url(s)/website for your Node.js applications.

crawler fetch logo nodejs scrape website

Last synced: 02 Feb 2026

https://github.com/zhangyunhao116/mini-spider

简单、实用的爬虫工具,仅需四步创建属于你的爬虫程序!

crawler python spider

Last synced: 13 Apr 2025

https://github.com/ReedD/crawler

Chromium / Puppeteer site crawler

bot chromium crawler puppeteer redis scraper

Last synced: 13 Mar 2025

https://github.com/evil0ctal/wechat-channels-video-file-decryption

一个可在线运行的微信视频号加密视频解密工具和 API 服务,基于逆向工程分析实现。本项目使用微信官方的 WebAssembly (WASM) 模块来生成 Isaac64 PRNG 密钥流,并通过 XOR 运算完成视频解密。

crawler reverse-engineering wechat wechat-api wechat-channel wechat-crawler wechat-hack wechat-hook wechat-video wechat-video-download

Last synced: 04 Apr 2026

https://github.com/valerebron/usetube

search & get datas from youtube no google account needed

crawler typescript video youtube youtube-api

Last synced: 13 Apr 2025

https://github.com/duoan/codes-scratch-crawler

读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本实现,网页去重的算法,网页指纹算法,文本信息挖掘

crawler scratch

Last synced: 18 Jun 2026

https://github.com/mendableai/firecrawl-py

Crawl and convert any website into clean markdown

ai crawler llm python scraper

Last synced: 14 Apr 2025

https://github.com/pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

automation blue-team-tool content-detection content-discovery crawler crawling cyber-security cybersecurity cybersecurity-tools data-collection data-science distributed-systems golang indexer indexing reconnaissance red-team-tools scraping search-engine vulnerability-detection

Last synced: 06 Feb 2026

https://github.com/goldarowana/douyin-crawler

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

crawler douyin douyin-download java vertx

Last synced: 23 Oct 2025

https://github.com/murat/tors

⏬ Yet another torrent searching application for your command line

crawler ruby-gem torrent-downloader torrent-search-engine

Last synced: 10 Mar 2026

https://github.com/injectrl/needfree

Crawl 100%-discount games on steam

crawler discount python steam

Last synced: 26 Jul 2025

https://github.com/mawrkus/jason-the-miner

⛏ A versatile Web scraper for Node.js

crawler crawling javascript scraper scraping web-scraper

Last synced: 08 Apr 2025

https://github.com/mike442144/seenreq

Generate an object for testing if a request is sent, request is Mikeal's request.

crawler duplicates-removed post request spider url

Last synced: 31 Aug 2025

https://github.com/spk/maman

Rust Web Crawler saving pages on Redis

crawler http spider web web-crawler

Last synced: 07 Oct 2025

https://github.com/jin10086/copyheaders

方便的从浏览器复制浏览器头

crawler python tools

Last synced: 26 Sep 2025

https://github.com/soruly/anilist-crawler

Crawl data from anilist API and store in MariaDB.

anilist anime crawler

Last synced: 19 Jun 2025

https://github.com/riquellopes/fii

API para recuperar informações sobre FII

crawler investiment mongodb nodejs

Last synced: 16 Jan 2026

https://github.com/ireoo/spider.npm

网络爬虫类库,基本可以实现自定义规则大部分网站

crawler npm spider superagent

Last synced: 12 Oct 2025

https://github.com/golang-collection/go-crawler-distributed

分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息的异步发送,使用到的框架包括:redigo, gorm, goquery, easyjson, viper, amqp, zap, go-micro,并通过Docker实现容器化部署,中间爬虫节点支持水平拓展。

crawler docker elasticsearch go go-micro gocrawler microservice rabbitmq

Last synced: 09 Jul 2025

https://github.com/liangWenPeng/scrapy-admin

A django admin site for scrapy

crawler scrapy scrapyd spider

Last synced: 06 Aug 2025

https://github.com/lewoudar/scalpel

A fast and powerful web scraping library

anyio asyncio crawler gevent python scalpel trio webscraping

Last synced: 19 Jan 2026

https://github.com/jsrei/page-redirect-code-location-hook

JS逆向技巧:页面跳转JS代码定位通杀方案

crawler js-revers userscript

Last synced: 19 Apr 2025

https://github.com/0xhjk/x12306

12306查票助手,一键查询沿途所有站点,先上车后补票,让你的出行更省心。

12306 12306buyticket 12306helper 12306qiang-piao crawler fk12306 helper reqeusts spider ticket train x12306

Last synced: 26 Feb 2026

https://github.com/healeycodes/broken-link-crawler

:robot: Python bot that crawls your website looking for dead stuff

bot crawler python

Last synced: 30 Apr 2025

https://github.com/healeycodes/Broken-Link-Crawler

:robot: Python bot that crawls your website looking for dead stuff

bot crawler python

Last synced: 27 Sep 2025