{"id":13715904,"url":"https://github.com/wx-chevalier/sentinel-crawler","last_synced_at":"2025-08-22T08:31:38.051Z","repository":{"id":43491963,"uuid":"88931826","full_name":"wx-chevalier/sentinel-crawler","owner":"wx-chevalier","description":"Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器，分布式爬虫","archived":false,"fork":false,"pushed_at":"2023-04-15T15:11:50.000Z","size":4912,"stargazers_count":123,"open_issues_count":33,"forks_count":25,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-12-04T09:17:59.248Z","etag":null,"topics":["crawler","etl","koa2","monitor","nodejs","react","wx-code"],"latest_commit_sha":null,"homepage":"https://wxyyxc1992.github.io","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wx-chevalier.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-04-21T02:41:26.000Z","updated_at":"2024-11-29T02:23:06.000Z","dependencies_parsed_at":"2024-01-13T10:42:10.879Z","dependency_job_id":"936760c3-4e47-4b3a-8161-83e45780d76c","html_url":"https://github.com/wx-chevalier/sentinel-crawler","commit_stats":null,"previous_names":["wxyyxc1992/declarative-crawler","wxyyxc1992/xe-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wx-chevalier%2Fsentinel-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wx-chevalier%2Fsentinel-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wx-chevalier%2Fsentinel-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wx-chevalier%2Fsentinel-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wx-chevalier","download_url":"https://codeload.github.com/wx-chevalier/sentinel-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230575851,"owners_count":18247484,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","etl","koa2","monitor","nodejs","react","wx-code"],"created_at":"2024-08-03T00:01:04.916Z","updated_at":"2024-12-20T11:08:03.281Z","avatar_url":"https://github.com/wx-chevalier.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# xe-crawler\n\nxe-crawler 是遵循声明式、可监测理念的分布式爬虫，其计划提供 Node.js、Go、Python 多种实现，能够对于静态 Web 页面、动态 Web 页面、关系型数据库、操作系统等异构多源数据进行抓取。xe-crawler 希望让使用者专注于领域逻辑而不用考虑调度、监控等问题，并且稍加改造就能用于系统监控、ETL 数据迁移等领域。更多的 xe-crawler 设计思想、设计规范参考[爬虫实战 https://url.wx-coder.cn/3gyS2](https://url.wx-coder.cn/3gyS2)。\n\n# Usage \u0026 Development\n\n## Standalone Crawler Framework | 单个爬虫框架的独立使用\n\n## Deployment with Supervisor | 带调度节点的集群化部署\n\n# Cases | 使用案例\n\n- [cendertron](./adapters/cendertron) 提供了基于 Node.js 的 Puppetter 独立爬虫包装，内置了动作模拟、URL 去重、界面表单与请求提取等特性。\n\n- [基于 Node.js 的声明式可监控爬虫网络初探](https://zhuanlan.zhihu.com/p/26463840)：本文是最早的设计思想与用例概述，其中使用的部分用例已经废弃，可以阅读了解下笔者的原始设计思想。\n\n- [使用 xe-crawler 爬取知乎美图](https://zhuanlan.zhihu.com/p/26691789)\n\n\n# Home \u0026 More | 延伸阅读\n\n![](https://i.postimg.cc/59QVkFPq/image.png)\n\n您可以通过以下导航来在 Gitbook 中阅读笔者的系列文章，涵盖了技术资料归纳、编程语言与理论、Web 与大前端、服务端开发与基础架构、云计算与大数据、数据科学与人工智能、产品设计等多个领域：\n\n- 知识体系：《[Awesome Lists](https://ngte-al.gitbook.io/i/)》、《[Awesome CheatSheets](https://ngte-ac.gitbook.io/i/)》、《[Awesome Interviews](https://github.com/wx-chevalier/Awesome-Interviews)》、《[Awesome RoadMaps](https://github.com/wx-chevalier/Awesome-RoadMaps)》、《[Awesome MindMaps](https://github.com/wx-chevalier/Awesome-MindMaps)》、《[Awesome-CS-Books-Warehouse](https://github.com/wx-chevalier/Awesome-CS-Books-Warehouse)》\n\n- 编程语言：《[编程语言理论](https://ngte-pl.gitbook.io/i/)》、《[Java 实战](https://ngte-pl.gitbook.io/i/java/java)》、《[JavaScript 实战](https://ngte-pl.gitbook.io/i/javascript/javascript)》、《[Go 实战](https://ngte-pl.gitbook.io/i/go/go)》、《[Python 实战](https://ngte-pl.gitbook.io/i/python/python)》、《[Rust 实战](https://ngte-pl.gitbook.io/i/rust/rust)》\n\n- 软件工程、模式与架构：《[编程范式与设计模式](https://ngte-se.gitbook.io/i/)》、《[数据结构与算法](https://ngte-se.gitbook.io/i/)》、《[软件架构设计](https://ngte-se.gitbook.io/i/)》、《[整洁与重构](https://ngte-se.gitbook.io/i/)》、《[研发方式与工具](https://ngte-se.gitbook.io/i/)》\n\n* Web 与大前端：《[现代 Web 开发基础与工程实践](https://ngte-web.gitbook.io/i/)》、《[数据可视化](https://ngte-fe.gitbook.io/i/)》、《[iOS](https://ngte-fe.gitbook.io/i/)》、《[Android](https://ngte-fe.gitbook.io/i/)》、《[混合开发与跨端应用](https://ngte-fe.gitbook.io/i/)》\n\n* 服务端开发实践与工程架构：《[服务端基础](https://ngte-be.gitbook.io/i/)》、《[微服务与云原生](https://ngte-be.gitbook.io/i/)》、《[测试与高可用保障](https://ngte-be.gitbook.io/i/)》、《[DevOps](https://ngte-be.gitbook.io/i/)》、《[Node](https://ngte-be.gitbook.io/i/)》、《[Spring](https://ngte-be.gitbook.io/i/)》、《[信息安全与渗透测试](https://ngte-be.gitbook.io/i/)》\n\n* 分布式基础架构：《[分布式系统](https://ngte-infras.gitbook.io/i/)》、《[分布式计算](https://ngte-infras.gitbook.io/i/)》、《[数据库](https://ngte-infras.gitbook.io/i/)》、《[网络](https://ngte-infras.gitbook.io/i/)》、《[虚拟化与编排](https://ngte-infras.gitbook.io/i/)》、《[云计算与大数据](https://ngte-infras.gitbook.io/i/)》、《[Linux 与操作系统](https://ngte-infras.gitbook.io/i/)》\n\n* 数据科学，人工智能与深度学习：《[数理统计](https://ngte-aidl.gitbook.io/i/)》、《[数据分析](https://ngte-aidl.gitbook.io/i/)》、《[机器学习](https://ngte-aidl.gitbook.io/i/)》、《[深度学习](https://ngte-aidl.gitbook.io/i/)》、《[自然语言处理](https://ngte-aidl.gitbook.io/i/)》、《[工具与工程化](https://ngte-aidl.gitbook.io/i/)》、《[行业应用](https://ngte-aidl.gitbook.io/i/)》\n\n* 产品设计与用户体验：《[产品设计](https://ngte-pd.gitbook.io/i/)》、《[交互体验](https://ngte-pd.gitbook.io/i/)》、《[项目管理](https://ngte-pd.gitbook.io/i/)》\n\n* 行业应用：《[行业迷思](https://github.com/wx-chevalier/Business-Series)》、《[功能域](https://github.com/wx-chevalier/Business-Series)》、《[电子商务](https://github.com/wx-chevalier/Business-Series)》、《[智能制造](https://github.com/wx-chevalier/Business-Series)》\n\n此外，前往 [xCompass](https://wx-chevalier.github.io/home/#/search) 交互式地检索、查找需要的文章/链接/书籍/课程；或者在在 [MATRIX 文章与代码索引矩阵](https://github.com/wx-chevalier/Developer-Zero-To-Mastery)中查看文章与项目源代码等更详细的目录导航信息。最后，你也可以关注微信公众号：『**某熊的技术之路**』以获取最新资讯。\n\n# About\n\n## Motivation \u0026 Credits\n\n- [annie](https://github.com/iawia002/annie): A fast, simple and clean video downloader\n\n### Golang\n\n- [2015-go_spider #Project#](https://github.com/hu17889/go_spider): An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.\n\n- [2018-Muffet #Project#](https://github.com/raviqqe/muffet): Muffet is a website link checker which scrapes and inspects all pages in a website recursively.\n\n* [2018-ferret #Project#](https://github.com/MontFerret/ferret): ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics.\n\n* [2019-TopList #Project#](https://github.com/tophubs/TopList): 今日热榜，一个获取各大热门网站热门头条的聚合网站，使用Go语言编写，多协程异步快速抓取信息\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwx-chevalier%2Fsentinel-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwx-chevalier%2Fsentinel-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwx-chevalier%2Fsentinel-crawler/lists"}