{"id":16293647,"url":"https://github.com/fanyong920/crawlitem","last_synced_at":"2025-07-21T07:31:53.106Z","repository":{"id":53061929,"uuid":"166180362","full_name":"fanyong920/crawlItem","owner":"fanyong920","description":"用于爬取淘宝天猫网页的谷歌插件","archived":false,"fork":false,"pushed_at":"2020-06-04T12:12:18.000Z","size":195,"stargazers_count":19,"open_issues_count":1,"forks_count":10,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-03T10:21:36.362Z","etag":null,"topics":["crawler","javascript","taobao","tmall"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fanyong920.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-17T07:24:57.000Z","updated_at":"2025-02-13T03:19:43.000Z","dependencies_parsed_at":"2022-08-23T23:20:51.608Z","dependency_job_id":null,"html_url":"https://github.com/fanyong920/crawlItem","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fanyong920/crawlItem","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fanyong920%2FcrawlItem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fanyong920%2FcrawlItem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fanyong920%2FcrawlItem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fanyong920%2FcrawlItem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fanyong920","download_url":"https://codeload.github.com/fanyong920/crawlItem/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fanyong920%2FcrawlItem/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266261119,"owners_count":23901284,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","javascript","taobao","tmall"],"created_at":"2024-10-10T20:11:57.191Z","updated_at":"2025-07-21T07:31:53.088Z","avatar_url":"https://github.com/fanyong920.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"该插件适用chrome,360,搜狐等浏览器\n## 起源\n起初是想写一个爬取淘淘宝天猫商品的插件，现在这个插件抓取所有网站的整个网页内容。由于抓取的是整个网页内容，你得到整个网页内容后需要自己解析所需要的信息。下面是这个插件的使用步骤：\n\n**1.下载该项目到本地电脑，解压，得到crawlItem文件夹**\n\n**2.打开chrome浏览器，在地址栏直接输入chrome://extensions/ 打开扩展程序页面**\n**\u003cbr/\u003e或者点击界面右上角三点-\u003e更多工具-\u003e扩展程序也可实现同样的效果**\n\n**3.在打开的页面右上点击开发者模式按钮，打开开发者模式**\n\n**4.点击加载已解压的扩展程序，选择刚才的crwalItem文件夹，安装谷歌插件，稍等片刻，就能看到页面多了一个插件，如果安装的插件没有自动打开，请点击插件右下角的按钮，打开插件，同时在浏览器右上角也多了一个图标**\n**5.点击图标，看到有两个选项，根据所需打开相关选项。**\n\n\n\n```java\n打开爬取页面功能：勾选该复选框，才会向后台接口发送页面内容，同时接收数据接口出现\n自动关闭页面：勾选该复选框，爬取页面完成后，页面自动关闭。\n接收数据接口：接收页面数据的接口，需要自己定义，默认http://localhost:8080/content,与打开爬取页面功能联动\n```\n接收数据接口样例：\n```java\npackage com.molikam.shop.controller;\n\n\nimport java.util.concurrent.atomic.AtomicInteger;\n\nimport org.springframework.web.bind.annotation.RequestMapping;\nimport org.springframework.web.bind.annotation.RequestMethod;\nimport org.springframework.web.bind.annotation.RestController;\n\n\n@RestController\npublic class CrawlerController {\n\t\n\tAtomicInteger count = new AtomicInteger(0);\n\t@RequestMapping(value=\"/content\",method={RequestMethod.POST})\n\tpublic void getContent(String content){\n\t\t\n\t\tSystem.out.println(count.incrementAndGet());\n\t\tSystem.out.println(content);\n\t\t\n\t}\n}\n\n```\n当您打开网页爬取功能，并且定义好接收数据接口，此时，您可以随意打开一个网页，如果顺利的话，接口会打印出网页的内容。\u003cbr/\u003e\n已经发布到谷歌商店，可搜索下载\u003cbr/\u003e\n![](https://i.loli.net/2020/04/10/6yxNbqOljRBdk94.png)\n插件地址：[点我](https://chrome.google.com/webstore/detail/chromecrawl/pcadbaceejnkfhkoomcbdifcpfefkmbl?authuser=0\u0026hl=zh-CN)\n\n#### 我的JAVA爬虫框架\nhttps://github.com/fanyong920/jvppeteer\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffanyong920%2Fcrawlitem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffanyong920%2Fcrawlitem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffanyong920%2Fcrawlitem/lists"}