{"id":13487765,"url":"https://github.com/MatrixSeven/ZhihuSpider","last_synced_at":"2025-03-27T23:31:36.502Z","repository":{"id":105500189,"uuid":"75077564","full_name":"MatrixSeven/ZhihuSpider","owner":"MatrixSeven","description":"知乎爬虫/可以爬出关注关系的爬虫","archived":false,"fork":false,"pushed_at":"2017-08-14T07:03:15.000Z","size":257,"stargazers_count":300,"open_issues_count":3,"forks_count":77,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-10-30T23:35:39.079Z","etag":null,"topics":["java","spiders","zhihu"],"latest_commit_sha":null,"homepage":"https://MatrixSeven.github.io","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MatrixSeven.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-11-29T12:05:06.000Z","updated_at":"2024-10-07T04:48:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"eb6f901d-953e-4afa-b0d9-6d9264b5d056","html_url":"https://github.com/MatrixSeven/ZhihuSpider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MatrixSeven%2FZhihuSpider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MatrixSeven%2FZhihuSpider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MatrixSeven%2FZhihuSpider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MatrixSeven%2FZhihuSpider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MatrixSeven","download_url":"https://codeload.github.com/MatrixSeven/ZhihuSpider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944019,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","spiders","zhihu"],"created_at":"2024-07-31T18:01:03.509Z","updated_at":"2025-03-27T23:31:36.079Z","avatar_url":"https://github.com/MatrixSeven.png","language":"Java","readme":"# 知乎爬虫\n\n## 博客更新地址:[https://matrixseven.github.io](https://matrixseven.github.io/2016/11/23/%E7%9F%A5%E4%B9%8E%E7%88%AC%E8%99%AB%E4%B9%8B%E5%BC%80%E7%AF%87/) \n## 知乎专栏更新地址:[https://zhuanlan.zhihu.com/Accelerator](https://zhuanlan.zhihu.com/Accelerator) \n## 博客园相关文章:[http://www.cnblogs.com/seven007](http://www.cnblogs.com/seven007/p/6248578.html)\n### 1. Git求Star~O(∩_∩)O哈哈~~\n### 2. 知乎求关注~~[知乎账号@Accelerator](https://www.zhihu.com/people/Sweets07)\n### 3. 本git只包含爬虫部分,web服务器和可视化部分另外单开.\n在知乎看到一个可视化话题的文章，所以一时心血来潮，打算用Java也写一个爬虫并且集成到Spring中，结合ECharts生成人物关系，当然，既然爬一次，个人信息也都要获取到。\n那么今天起起(结束日未知，目录也会根据实际情况进行更新)，我将写一个系列的爬取知乎的爬虫文章，一直到数据可视化完成（完成后，爬虫部分将使用Scala重写）。\n#\n\n附赠之前爬取的数据一份(mysql): 链接: http://pan.baidu.com/s/1qXGa8S8 密码: t2vi（只下载不点赞，不star，差评差评~蓝瘦香菇）\n下载转存的好多，，，但是没人star啊~兄弟们~~~\n![数据](数据.png)\n\n#\n## 1. 预计可视化部分包括\n1. 人物关系可视化\n2. 人员地理分布可视化\n3. 人员大学分布可视化\n4. 男女比例可视化\n5. 用户点赞可视化\n\n## 2. 预计内容和目录\n1. [开篇感言](https://zhuanlan.zhihu.com/p/23906171)\n2. [爬虫流程设计](https://zhuanlan.zhihu.com/p/23906423)\n    1. 如何过滤重复数据\n    2. 如何在爬取时创建人物关系\n3. [请求分析](https://zhuanlan.zhihu.com/p/23969440)\n    1. 登陆请求分析\n    2. 跟随/关注请求分析\n4. [抓取页面数据](https://zhuanlan.zhihu.com/p/24309888)\n    1. jsoup抽取页面内容\n5. [优化](https://zhuanlan.zhihu.com/p/24655256)\n    1. 使用多线程加速\n    2. 使用队列减少数据库访问\n    3. 实现LRU提高缓存命中率\n6. 基于SpringCloud的简单应用\n    1. 介绍\n    2. 简单配置\n7. 扩展内容\n    1. 整合Mybatis\n    2. 编写Jsonp跨域请求API\n8. 走起苦逼的前端\n    1. 使用Bootstrop布局\n    2. 引入ECharts图形库\n9. 再见，吹牛结束。\n\n\n# 吾爱Java(QQ群):[170936712（点击加入）](https://link.zhihu.com/?target=https%3A//jq.qq.com/%3F_wv%3D1027%26k%3D41oCCMn)\n\n#更新记录:\n1. 2016/11/30\n    1. 第一次上传\n2. 2016/12/13\n    1. 修复线程过多导致内存爆炸问题\n3. 2016/12/22\n    1. 修复数据库死锁问题\n    2. 更简单没水平的LruCache\n    3. 完善了初始化爬虫选择数据问题\n4. 2016/12/26\n    1. 修复多线程死锁问题\n5. 2016/12/28\n    1. 完善登陆流程\n    2. 修复增加follower问题\n    3. 修复更新数userBase据过慢问题\n    4. 减少cpu占用\n    5. userInfo表增加两个字段\n\n    \n## 部分截图\n![运行](test_01.gif)\n","funding_links":[],"categories":["Java"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMatrixSeven%2FZhihuSpider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMatrixSeven%2FZhihuSpider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMatrixSeven%2FZhihuSpider/lists"}