{"id":19151727,"url":"https://github.com/zhs007/jarviscrawlercore","last_synced_at":"2026-06-14T14:30:19.455Z","repository":{"id":34304779,"uuid":"175321513","full_name":"zhs007/jarviscrawlercore","owner":"zhs007","description":"JarvisCrawlerCore 是一套分布式 爬虫服务框架 / 页面编程机器人 。","archived":false,"fork":false,"pushed_at":"2023-01-08T13:58:46.000Z","size":2424,"stargazers_count":2,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-04T04:43:39.473Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zhs007.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-13T01:11:35.000Z","updated_at":"2022-10-02T02:29:41.000Z","dependencies_parsed_at":"2023-01-15T06:15:49.429Z","dependency_job_id":null,"html_url":"https://github.com/zhs007/jarviscrawlercore","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhs007%2Fjarviscrawlercore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhs007%2Fjarviscrawlercore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhs007%2Fjarviscrawlercore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhs007%2Fjarviscrawlercore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zhs007","download_url":"https://codeload.github.com/zhs007/jarviscrawlercore/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240236361,"owners_count":19769580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T08:15:33.412Z","updated_at":"2026-06-14T14:30:17.367Z","avatar_url":"https://github.com/zhs007.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Jarvis Crawler Core\n\nJarvisCrawlerCore 是一套分布式 爬虫服务框架 / 页面编程机器人 。  \n\n它可以用于构建一套数据爬取集群，也可以用于Web项目的自动化测试，或用于其它机器人操作web项目的子服务。  \n它用于多个机器人项目，包括群内自动翻译机器人、新闻推送频道、动漫影视资源推送频道、后台数据监控系统、页面分析、行业数据抓取等。\n\n我们仅在前期测试使用命令行，主要维护grpc服务模式。  \n\n建议用docker部署，多节点并行获取数据，目前仅有golang客户端（jccclient）可以提供基本的任务分派。  \n\n如果需要多节点的统一运维，可以使用Jarvis。\n\n机器配置要求，建议使用linux，能装docker。  \n内存2g及以上（1g内存也可以用，不要一次请求太多任务，chrome内存占用较严重，每隔一段时间重启服务会有好处，我们也有个翻译服务数月不重启的）。\n\n### M1 Mac下安装注意事项\n\n暂时需要用下面的指令安装。\n\n```\nnpm install --target_arch=x64\n```\n\n### 安装\n\n[这里](https://github.com/zhs007/dockerscripts/tree/master/jarviscrawlerserv) 是一个可以直接用于部署的脚本项目。\n\n下面的命令可以直接使用DockerHub源部署。  \n\n``` sh\ndocker push zerrozhao/jarviscrawlercore:latest\n```\n\n需要修改配置文件，``service.yaml``，建议放在 ``cfg`` 目录下。\n\n``` yaml\nservAddr: 0.0.0.0:7051\nheadless: true\n\nslowMo: 10\n\nclientToken:\n  - wzDkh9h2fhfUVuS9jZ8uVbhV3vC5AWX3\n```\n\n其中，clientToken，是用来校验权限的，可以配置多个，每次响应请求都会校验token，一个token可以提供给多个客户端使用。  \n\n### node.js Client 开发\n\n``nodejs``调用的例子，见``src/service/client2.js``。  \n\n可以通过 npm 安装依赖，即可使用。\n\n``` sh\nnpm i jarviscrawlercore --save\n```\n\n这里还有一个直接用 ``jarviscrawlercore`` 项目来打包漫画的项目，[这里](https://github.com/zhs007/getcomic) 。  \n\n### Golang Client 开发\n\n使用 ``jccclient`` 即可。\n\n### 更新说明\n\n##### v0.7\n\n- 依赖大幅更新\n\n##### v0.6\n\n- 调整``protos``结构\n- 配合``Charles``线上部署\n- 逐步开放API服务\n- 配合``jccclient``实现更高效的抓取工作\n- 支持更多的网站\n\n##### v0.5\n\n- 重构新闻功能\n- 漫画下载\n- 图片打包\n- 支持更多的网站\n\n##### v0.3\n\n- 代码结构调整\n- 支持移动设备网页抓取\n- 支持直接attach到已存在的chrome\n- 发布到dockerhub\n- 支持更多的网站\n\n##### v0.2\n\n- 极大的提升了节点的稳定性\n- 支持更多类型的网页抓取\n- 支持多节点（需要配合jccclient）\n- 支持更多的网站\n\n##### v0.1\n\n- 支持新闻抓取\n- 支持grpc服务\n- 支持翻译\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhs007%2Fjarviscrawlercore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhs007%2Fjarviscrawlercore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhs007%2Fjarviscrawlercore/lists"}