{"id":31865358,"url":"https://github.com/ireoo/spider.npm","last_synced_at":"2025-10-12T19:17:55.249Z","repository":{"id":57169852,"uuid":"85092074","full_name":"Ireoo/spider.npm","owner":"Ireoo","description":"网络爬虫类库,基本可以实现自定义规则大部分网站","archived":false,"fork":false,"pushed_at":"2019-07-08T03:47:38.000Z","size":21517,"stargazers_count":44,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-10-12T10:59:23.653Z","etag":null,"topics":["crawler","npm","spider","superagent"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/spider.io","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ireoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-15T15:52:17.000Z","updated_at":"2025-05-08T05:20:00.000Z","dependencies_parsed_at":"2022-08-27T13:11:26.081Z","dependency_job_id":null,"html_url":"https://github.com/Ireoo/spider.npm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Ireoo/spider.npm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ireoo%2Fspider.npm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ireoo%2Fspider.npm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ireoo%2Fspider.npm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ireoo%2Fspider.npm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ireoo","download_url":"https://codeload.github.com/Ireoo/spider.npm/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ireoo%2Fspider.npm/sbom","scorecard":{"id":67835,"data":{"date":"2025-08-11","repo":{"name":"github.com/Ireoo/spider.npm","commit":"a8819d3e2ec466d015ad4d4252562f628571957e"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.2,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/28 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":8,"reason":"binaries present in source code","details":["Warn: binary detected: bin/html:1","Warn: binary detected: bin/html.exe:1"],"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 4 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-15T03:00:44.964Z","repository_id":57169852,"created_at":"2025-08-15T03:00:44.964Z","updated_at":"2025-08-15T03:00:44.964Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279012647,"owners_count":26085158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","npm","spider","superagent"],"created_at":"2025-10-12T19:17:53.970Z","updated_at":"2025-10-12T19:17:55.242Z","avatar_url":"https://github.com/Ireoo.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spider.io\n\n![](https://img.shields.io/npm/v/spider.io.svg)\n![](https://img.shields.io/npm/dm/spider.io.svg)\n\n极简网络蜘蛛爬虫，适用任何网站，只需设置一条规则，就可以把你想要网站上的内容整理出来，非常方便，简单！\n\n## 最新更新\n\n### v5.0.9 [2019/7/8]\n\n1. 修复错误\n\n### v5.0.8 [2019/7/8]\n\n1. 优化程序\n1. 保证输出的数据不会影响到核心程序的处理\n1. 在 rules 中 cb 增加参数 hash，data\n1. 完善说明文档\n\n### v5.0.7 [2019/7/7]\n\n1. 修复程序无法识别多条规则，导致每次只显示最后一条的问题\n\n### v5.0.6 [2018/11/21]\n\n1. 修复因为无法解析网页内容导致程序终止\n\n### v5.0.3 [2018/11/18]\n\n1.  修复程序无法成功启动问题\n\n### v5.0.0 [2018/4/14]\n\n1.  弃用 crawler.js, 改用 superagent 模块\n1.  增加 debug 颜色分类\n1.  重构代码\n\n### v4.2.7\n\n1.  新添加 rules.cb 函数，处理复杂的值，最后一定要用 return 返回\n1.  优化核心程序，能够访问更多的网页\n1.  增加完成操作'done'返回函数\n1.  添加线程处理\n1.  修改规则\n1.  添加一些规则示例（在 test 目录）使用时不设置 links 将自动运行示例\n\n## 使用说明\n\n```code\nnpm install spider.io --save\n```\n\n```javascript\nconst Spider = require(\"spider.io\");\nnew Spider({\n  callback: function(hash, data) {\n    console.log(hash, data);\n  },\n  run: true //立即运行\n});\n```\n\nOR\n\n```javascript\nconst Spider = require(\"spider.io\");\nnew Spider({\n  callback: function(hash, data) {\n    console.log(hash, data);\n  }\n}).run();\n```\n\n## 参数说明\n\n### 参数格式如下\n\n```javascript\nconst options = {\n  init: {\n    debug: false,\n    delay: 1000,\n    timeout: 5000,\n    retrys: 3,\n    threads: 1,\n    loop: false\n  },\n  links: {\n    title: \"\",\n    hash: \"\",\n    url: \"\",\n    rules: [\n      // 列表类型的数据，带下一级处理\n      {\n        list: \"a\",\n        rule: {\n          url: {\n            // 同级包含links的，必须有此参数\n            type: \"href\",\n            text: \"\"\n          },\n          title: {\n            type: \"text\",\n            text: \"\"\n          }\n        },\n        links: []\n      },\n      // 普通类型的数据\n      {\n        rule: {\n          url: {\n            // 同级包含links的，必须有此参数\n            type: \"href\",\n            text: \"\"\n          },\n          title: {\n            type: \"text\",\n            text: \"\"\n          }\n        }\n      },\n      // 数组形式的数据\n      {\n        key: \"\",\n        list: \"\",\n        rule: {\n          url: {\n            // 同级包含links的，必须有此参数\n            type: \"href\",\n            text: \"\"\n          },\n          title: {\n            type: \"text\",\n            text: \"\"\n          }\n        }\n      },\n      // 自定义处理返回数据，会合并上一级数据\n      {\n        cb: ($, init) =\u003e {\n          // $ -\u003e 为格式化的dom对象，可以直接操作，语法规则请查看 jQuery\n          // init -\u003e {hash, data}\n          // ...code\n          // 如果同级包含links，必须要有返回值，并且要包含url；可以返回 array 或 object\n          // return [{url: ''}] or {url: ''};\n        }\n      }\n    ]\n  },\n  callback: (hash, data) =\u003e {\n    // 数据以单条记录返回，并不会一次返回所有值\n  },\n  done: () =\u003e {\n    // 全部处理完毕后回调该函数\n  }\n};\n```\n\n### init (主参数）\n\n| 参数名  | 使用说明                                          | 默认值 |\n| :------ | :------------------------------------------------ | :----: |\n| debug   | 输出调试信息，包括（网站访问时间，网站访问的内容) | false  |\n| delay   | 设置每个网站之间访问延迟                          |  1000  |\n| timeout | 设置网站访问超时时间                              |  5000  |\n| retrys  | 设置网站访问重试次数                              |   3    |\n| threads | 设置线程数                                        |   1    |\n| loop    | 结束后是否自动重新开始                            | false  |\n\n### headers (主参数）(具体说明请查看 [superagent](https://www.npmjs.com/package/superagent)）\n\n### links (主参数）\n\n| 参数名 | 使用说明                                                                             | 类型         | 必须 |\n| :----- | :----------------------------------------------------------------------------------- | :----------- | :--: |\n| title  | 用于说明规则的作用                                                                   | text         |  ×   |\n| hash   | 用作识别码，在 callback 中完全返回                                                   | 不限制       |  ×   |\n| url    | 访问的网址                                                                           | text/array   |  √   |\n| rules  | 应用于当前网址的规则                                                                 | array/object |  √   |\n| max    | 当 url 中包含{i}时，设置 i 的最大值                                                  | number       |  ×   |\n| min    | 当 url 中包含{i}时，设置 i 的最小值, 设置此值必须要设置 max，此默认值：1             | number       | ×    |\n\n#### rules\n\n| 参数名 | 使用说明                                                       | 类型         | 必须 |\n| :----- | :------------------------------------------------------------- | :----------- | :--: |\n| list   | 设置列表开始地址                                               | text         |  ×   |\n| rule   | 设置获取的内容                                                 | array/object |  ×   |\n| links  | 对于上一层的循环事件中连接另一规则                             | array/object | ×    |\n| cb     | 直接用函数操作，必须要返回值，\\$为格式化网站内容，必须要返回值 | function(\\$) |  ×   |\n\n##### rule （使用 jquery 选择器规则）\n\n| 参数名 | 使用说明                                 | 类型 |\n| :----- | :--------------------------------------- | :--- |\n| key    | 返回值为\\\u003ckey\u003e\\\u003ctext\u003e位置的\\\u003ctype\u003e属性值 | text |\n\n使用方法：\n\n```javascript\n{\n    \u003ckey\u003e: {\n        type: 'text|val|html|href|src|....', //可以自己设置属性\n        text: ''                             //对于循环事件中，可以不设置值\n    }\n}\n```\n\n##### links\n\n在使用 links 时，此规则中必须包含 list，并且 rule 中必须包含\\\u003ckey\u003e为 url\u003cbr\u003e\n在连接的规则中会自动将列表中获取的 url，对 links 的 url 逐个替换，生成新的规则。\n\n### callback (主参数）\n\n获取数据后的返回函数，返回值：\n\n| 参数名 | 使用说明                                                    |  类型  |\n| :----- | :---------------------------------------------------------- | :----: |\n| hash   | 返回该条规则中设置的 hash，不做处理，直接返回，用作规则识别 | 不限制 |\n| data   | 逐个返回最后一层获取的数据                                  |  json  |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fireoo%2Fspider.npm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fireoo%2Fspider.npm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fireoo%2Fspider.npm/lists"}