{"id":18765428,"url":"https://github.com/handexing/jdbee","last_synced_at":"2025-09-03T02:42:47.549Z","repository":{"id":96439526,"uuid":"92254170","full_name":"handexing/JdBee","owner":"handexing","description":"整合使用selenium+phantomjs+WebCollector爬取京东数据，并做数据持久化。","archived":false,"fork":false,"pushed_at":"2017-06-10T08:10:54.000Z","size":21515,"stargazers_count":49,"open_issues_count":0,"forks_count":25,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-13T05:12:18.834Z","etag":null,"topics":["httpclient","jsoup","phantomjs","selenium","selenium-java","webcollector"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/handexing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-24T05:42:06.000Z","updated_at":"2024-08-02T06:10:30.000Z","dependencies_parsed_at":"2023-04-09T04:55:18.377Z","dependency_job_id":null,"html_url":"https://github.com/handexing/JdBee","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/handexing/JdBee","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/handexing%2FJdBee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/handexing%2FJdBee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/handexing%2FJdBee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/handexing%2FJdBee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/handexing","download_url":"https://codeload.github.com/handexing/JdBee/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/handexing%2FJdBee/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273381913,"owners_count":25095327,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["httpclient","jsoup","phantomjs","selenium","selenium-java","webcollector"],"created_at":"2024-11-07T18:33:59.526Z","updated_at":"2025-09-03T02:42:47.493Z","avatar_url":"https://github.com/handexing.png","language":"Java","readme":"# JdBee\r\n## 使用jsoup抓取京东数据\r\n\r\n\u003e **只用于学习交流，私自用于其他途径，后果自负！！！**\r\n\r\n\u003e 目前只抓取零食相关的数据,现在就只需要零食相关的数据,其他后续再议!\r\n\r\n\u003e 抓取零食相关的目的就是为了这个[vipsnacks](https://github.com/handexing/vipsnacks)项目的后续开发。\r\n\r\n\r\n\r\n## 项目需要\r\n\r\n- httpclient\r\n- jsoup\r\n- slf4j\r\n- selenium\r\n- phantomjs\r\n- WebCollector\r\n\r\n## 更新日志\r\n\r\n- 初始化项目，完成一,二级类目的抓取 (*2017-05-24*)\r\n- 采用selenium获取页面数据，获取三,四,五级类目(*2017-05-25*)\r\n- 多线程并发爬取类目分页数据(*2017-05-26*)\r\n- 多线程爬取商品skuid(*2017-05-28*)\r\n\r\n**selenium这个爬取的速度太慢了，而且每次还要打开一个网页，抓取少量数据还可以用一用，多的话实在罩不住，近期在找别的方法爬取**\r\n\r\n- 使用WebCollector+selenium+phantomjs爬取商品(*2017-06-01只爬取一个类目测试*)\r\n- 数据入库测试(*2017-06-02*)\r\n- 测试爬取一个小类目，爬取20万数据用时21分钟(*2017-06-03*)\r\n- 数据正常入库,爬取数据**285330**条(*2017-06-04*)\r\n- 优化获取商品代码，从获取一页要19664毫秒，优化到现在获取一页商品要7000毫秒左右,(*2017-06-07*)\r\n\r\n\r\n\u003e 觉得不错的朋友可以点下star,watch,fork也算是对我的鼓励了。","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhandexing%2Fjdbee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhandexing%2Fjdbee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhandexing%2Fjdbee/lists"}