An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver

Last synced: 14 May 2025

https://github.com/ma6254/FictionDown

小说下载|小说爬取|起点|笔趣阁|导出Markdown|导出txt|转换epub|广告过滤|自动校对

biquge crawler fiction golang novels qidian spider

Last synced: 04 May 2025

https://github.com/fffonion/xehentai

Doujinshi downloader 绅士漫画下载

crawler json-rpc python xehentai

Last synced: 16 May 2025

https://github.com/xuxueli/xxl-crawler

A lightweight web crawler framework.(Java爬虫框架)

crawler distributed flexible java object-oriented spider web xxl-crawler

Last synced: 15 May 2025

https://github.com/lixi5338619/lxbook

《爬虫逆向进阶实战》书籍代码库

android-resever crawler frida java javascript python smali spiders unidbg xposed

Last synced: 13 Apr 2025

https://github.com/jsrei/js-cookie-monitor-debugger-hook

js cookie逆向利器:js cookie变动监控可视化工具 & js cookie hook打条件断点

crawler js-reverse red-team reverse-engineering userscript web-security-research

Last synced: 15 May 2025

https://github.com/xiaoxiunique/x-kit

一个用于抓取和分析 X (Twitter) 用户数据和推文的工具。

crawler kols twitter x

Last synced: 15 May 2025

https://github.com/python3webspider/douyin

API of DouYin for Humans used to Crawl Popular Videos and Musics

crawler douyin spider videos

Last synced: 04 Apr 2025

https://github.com/StanGirard/seo-audits-toolkit

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer

Last synced: 26 Mar 2025

https://github.com/Kharacternyk/dotcommon

What do people have in their dotfiles?

crawler dotfiles

Last synced: 29 Mar 2025

https://github.com/stangirard/seo-audits-toolkit

SEO & Security Audit for Websites. Lighthouse & Security Headers crawler, Sitemap/Keywords/Images Extractor, Summarizer, etc ...

analysis audits crawler dashboard extractor headers internal-links lighthouse link-extractor python securityheader seo seo-tools serp summarizer

Last synced: 04 Apr 2025

https://github.com/fengzhizi715/NetDiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 03 May 2025

https://github.com/fengzhizi715/netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

coroutines crawler disruptor dsl htmlunit kafka kotlin lettuce middleware redis rxjava2 selenium spider vertx3

Last synced: 04 Apr 2025

https://github.com/go-crawler/go_jobs

带你了解一下Golang的市场行情

crawler go golang lagou spider

Last synced: 16 Jan 2026

https://github.com/rndinfosecguy/Scavenger

Crawler (Bot) searching for credential leaks on paste sites.

bot crawler credentials leaks osint paste pastebin python

Last synced: 20 Mar 2025

https://github.com/linkedtales/scrapedin

LinkedIn Scraper (currently working 2020)

crawler linkedin linkedin-scraper scraper

Last synced: 14 May 2025

https://github.com/speed/newcrawler

Free Web Scraping Tool with Java

crawler docker scraping spider

Last synced: 02 Apr 2025

https://github.com/yhy0/Jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers. 挖洞辅助工具(漏洞扫描、信息收集)

apollo-exp bugcrowd crawler hackerone jie scan scanner security-copilot shiro-exp src vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 07 Sep 2025

https://github.com/ChenZixinn/spider_reverse

爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链

crawler python requests spider

Last synced: 28 Mar 2025

https://github.com/rajatomar788/pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.

archive-tool crawler html html-parser mirror python web webpage

Last synced: 08 Jul 2025

https://github.com/iudicium/pryingdeep

Prying Deep - An OSINT tool to collect intelligence on the dark web.

crawler darkweb go gocolly golang-osint onion osint osint-tools pryingdeep security-tools

Last synced: 14 Jan 2026

https://github.com/c0d3d3v/moodle-dl

Moodle-DL downloads course content fast from Moodle (eg. lecture pdfs)

crawler downloader hacktoberfest moode-crawler moodle moodle-dl moodle-downloader scraper sync

Last synced: 21 Oct 2025

https://github.com/nanshihui/Scan-T

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 04 May 2025

https://github.com/nanshihui/scan-t

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 02 Apr 2025

https://github.com/chenjiandongx/mmjpg

👩 美女写真套图爬虫(一)

crawler meinv

Last synced: 05 Apr 2025

https://github.com/chushuai/wscan

Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.

cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss

Last synced: 11 Jul 2025

https://github.com/dirtyfilthy/freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

crawler darknet hidden-services onion scraper spider tor

Last synced: 07 Apr 2025

https://github.com/stanzhai/html2article

Html网页正文提取

article content crawler html spider topic

Last synced: 16 May 2025

https://github.com/stanzhai/Html2Article

Html网页正文提取

article content crawler html spider topic

Last synced: 08 Jul 2025

https://github.com/snakem982/pandora-box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 28 Jan 2026

https://github.com/yhy0/jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)

apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 05 Apr 2025

https://github.com/cyubuchen/free_proxy_website

获取免费socks/https/http代理的网站集合

crawler free-proxy-list ip proxy proxy-checker spider

Last synced: 11 May 2025

https://github.com/shaohua0116/ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

conference crawler data-analysis iclr iclr2020 machine-learning visualization

Last synced: 19 Jul 2025

https://github.com/TikHub/TikHub-API-Python-SDK

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).

api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin

Last synced: 11 May 2025

https://github.com/snakem982/Pandora-Box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 24 Mar 2025

https://github.com/AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping

Last synced: 14 Mar 2025

https://github.com/tasos-py/Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python

bing crawler google python scraper search-engine yahoo

Last synced: 09 Jul 2025

https://github.com/lgraubner/sitemap-generator

Easily create XML sitemaps for your website.

crawler google seo sitemap sitemap-generator xml-sitemap

Last synced: 15 May 2025

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 16 Mar 2025

https://github.com/gadfly0x/signature_algorithm

各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

crawler reverse-engineering spider

Last synced: 27 Apr 2025

https://github.com/0xMassi/webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping

Last synced: 04 Apr 2026

https://github.com/heqin-zhu/music-recover

:musical_note: 缓存文件转换为 MP3 文件

crawler mp3 python regex

Last synced: 16 Jan 2026

https://github.com/howie6879/magic_google

Google search results crawler, get google search results that you need

crawler google google-search spider

Last synced: 14 Dec 2025

https://github.com/smuyyh/crawlerforreader

Android 本地网络小说爬虫,基于jsoup及xpath

android bookreader crawler jsoup xpath

Last synced: 06 Apr 2025

https://github.com/shaohua0116/ICLR2019-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

crawler crawling-python openreview tutorial

Last synced: 19 Jul 2025

https://github.com/Josue87/EmailFinder

Search emails from a domain through search engines

crawler osint

Last synced: 05 May 2025

https://github.com/python3spiders/allnewsspider

澎湃新闻,新浪新闻,腾讯新闻,搜狐新闻,新闻联播,泰晤士报,纽约时报,BBCNews,旨在爬取所有新闻门户网站的新闻,禁止将所得数据商用!

bbc-news crawler newsapi nytimes sina sohu spider tencent thetimes xwlb

Last synced: 06 Apr 2025

https://github.com/elvisyjlin/media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

crawler instagram pixiv reddit scraper tiktok tumblr twitter

Last synced: 03 Apr 2025

https://github.com/mikemeliz/torcrawl.py

Crawl and extract (regular or onion) webpages through TOR network

crawler extractor onion osint python tor

Last synced: 14 Jan 2026

https://github.com/brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

crawler distributed-crawler robot sitemap web-crawler

Last synced: 12 Jan 2026

https://github.com/microsoft/ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

crawler data github github-api github-webhooks ospo

Last synced: 27 Sep 2025

https://github.com/dennis-tra/nebula

🌌 An agnostic network crawler exposing comprehensive peer information and network topology information.

cid crawler filecoin golang hacktoberfest ipfs libp2p

Last synced: 09 Jun 2026

https://github.com/chishui/jssoup

JavaScript + BeautifulSoup = JSSoup

beautifulsoup crawler html javascript nodejs parser react-native spider

Last synced: 16 May 2025

https://github.com/scrapy-plugins/scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

crawler crawler-detection plugin proxy scraping scrapy

Last synced: 16 May 2025

https://github.com/duzun/hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

broken-html crawler css-selectors domcrawler fast hquery html html-parser invalid-html jquery-like jquery-selectors parser php psr-0 psr-4 scraper selectors xml xml-parser

Last synced: 14 May 2025

https://github.com/Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot

Last synced: 05 Apr 2025

https://github.com/evil0ctal/fast-powerful-whisper-ai-services-api

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot

Last synced: 16 May 2025

https://github.com/rivermont/spidy

The simple, easy to use command line web crawler.

crawler crawling python python3 web-crawler web-spider

Last synced: 16 Jan 2026

https://github.com/misaka10843/copymanga-downloader

使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画(无速率限制),支持批量+选话下载和获取您收藏的漫画并下载及半自动获取订阅下载!(全平台支持(pypi)) Nas版本请查看copymanga-nasdownloader

comic copymanga crawler downloader python python3

Last synced: 14 Jan 2026

https://github.com/commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler

Last synced: 12 Jun 2025

https://github.com/xiyuan-fengyu/ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案

angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript

Last synced: 05 Apr 2025

https://github.com/yangjianxin1/qqmusicspider

基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

crawler music musicspider qqmusic scrapy

Last synced: 27 Oct 2025

https://github.com/jackluson/chinese-fund-crawler

中国场外基金数据爬取&汇总分析

crawler fund morningstar

Last synced: 27 Apr 2025

https://github.com/jsrei/crawler-js-hook-framework-public

JS逆向Hook工具集,开源部分工具到这里

crawler

Last synced: 26 Jan 2026

https://github.com/lgraubner/sitemap-generator-cli

Creates an XML-Sitemap by crawling a given site.

cli crawler google seo sitemap xml-sitemap

Last synced: 13 Apr 2025

https://github.com/MikeMeliz/TorCrawl.py

Crawl and extract (regular or onion) webpages through TOR network

crawler extractor onion osint python tor

Last synced: 07 Apr 2025

https://github.com/jeffersonqin/lightnovel_epub

🍭 epub generator for (light)novels (轻)小说 epub 生成器,支持站点:轻之国度、轻小说文库

cli crawler ebook epub lightnovel lk novel opencv python scraper uiautomator wenku8

Last synced: 26 Oct 2025

https://github.com/jaybizzle/laravel-crawler-detect

A Laravel wrapper for CrawlerDetect - the web crawler detection library

bot crawler detect laravel php spider

Last synced: 16 May 2025

https://github.com/infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 11 Apr 2026

https://github.com/marshalx/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

crawler crawling crawling-python parser telegram telegram-org telegram-updates

Last synced: 16 May 2025

https://github.com/MarshalX/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

crawler crawling crawling-python parser telegram telegram-org telegram-updates

Last synced: 15 May 2025