https://github.com/ma-pony/deepspider

智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web Scraping Platform - AI-powered Crawler Agent built on DeepAgents + Patchright
https://github.com/ma-pony/deepspider

ai-agent anti-detect automation captcha crawler javascript reverse-engineering web-scraping

Last synced: 3 months ago
JSON representation

智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web Scraping Platform - AI-powered Crawler Agent built on DeepAgents + Patchright

Host: GitHub
URL: https://github.com/ma-pony/deepspider
Owner: ma-pony
Created: 2026-01-26T06:09:33.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-02T10:44:37.000Z (4 months ago)
Last Synced: 2026-03-02T14:29:57.932Z (4 months ago)
Topics: ai-agent, anti-detect, automation, captcha, crawler, javascript, reverse-engineering, web-scraping
Language: JavaScript
Size: 1.53 MB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

# DeepSpider

[![npm version](https://img.shields.io/npm/v/deepspider.svg)](https://www.npmjs.com/package/deepspider)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> AI 原生的智能反爬平台 - 把 3 天的逆向分析工作压缩到 10 分钟

[English](README_EN.md)

## 核心特性

**AI First 架构** - AI 为核心，工具为辅助
- 直接理解混淆代码（无需反混淆预处理）
- 识别加密算法，正则 hints 辅助 LLM 分析
- 生成可运行代码（Python/JS）
- 统一模型配置，用户自选本地或云端 LLM

**完整反爬能力**
- 逆向分析：AI 理解 JS 源码，生成 Python 实现
- 验证码处理：OCR、滑块、点选
- 反检测：指纹伪装、代理轮换
- 爬虫编排：AI 生成完整项目

**真实浏览器 + CDP**
- Patchright 反检测浏览器
- CDP 深度集成（Hook、断点、拦截）
- 浏览器内置分析面板
- 实时数据采集（零 API 成本）

## 快速开始

### 安装

```bash
npm install -g deepspider
```

### 配置

```bash
deepspider config set apiKey sk-ant-api03-xxx
deepspider config set baseUrl https://api.anthropic.com
deepspider config set model claude-opus-4-6
```

### 使用

```bash
# 分析目标网站
deepspider https://example.com

# 快速 HTTP 请求（轻量级）
deepspider fetch https://api.example.com
```

## 使用流程

1. **启动**: `deepspider https://target-site.com`
2. **等待**: 浏览器打开，自动记录数据
3. **操作**: 登录、翻页、触发目标请求
4. **选择**: 点击面板 ⦿ 选择目标数据
5. **分析**: 选择操作（追踪来源/分析加密/生成爬虫）
6. **对话**: 继续提问，深入分析

## 架构

```
AI 原生架构（v2.0）

主 Agent（AI 驱动）
├── AI 理解层（核心 80%）
│ ├── 直接理解混淆代码
│ ├── 识别加密算法
│ └── 生成 Python 代码
├── 工具验证层（辅助 15%）
│ ├── 数据采集（浏览器+CDP）
│ ├── 动态验证（Hook+调试）
│ └── 代码执行（沙箱验证）
└── 能力扩展层（可选 5%）
├── 验证码处理
├── 反检测
└── 爬虫编排
```

## 加密分析

**Hints + LLM 架构**：
- 34 个正则模式（MD5/SHA/AES/RSA/SM2/SM3/SM4 等）自动提取加密类型 hints
- Hints 作为辅助信息注入 LLM prompt，提升分析准确率
- 所有分析由用户配置的 LLM 完成（本地或云端，统一配置）
- 无中间缓存层，避免缓存投毒导致的误判

## 文档

- [开发使用指南](docs/GUIDE.md)
- [调试指南](docs/DEBUG.md)

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ma-pony/deepspider

Awesome Lists containing this project

README