{"id":13932937,"url":"https://github.com/kong36088/ZhihuSpider","last_synced_at":"2025-07-19T16:32:09.981Z","repository":{"id":41044939,"uuid":"72274931","full_name":"kong36088/ZhihuSpider","owner":"kong36088","description":"多线程知乎用户爬虫，基于python3","archived":false,"fork":false,"pushed_at":"2023-05-29T09:24:22.000Z","size":85,"stargazers_count":238,"open_issues_count":1,"forks_count":84,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-08T21:19:34.406Z","etag":null,"topics":["crawler","multi-threading","python","python3","spider","zhihu"],"latest_commit_sha":null,"homepage":"http://zhihu.jwlchina.cn/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kong36088.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-10-29T08:58:23.000Z","updated_at":"2024-07-14T18:20:37.000Z","dependencies_parsed_at":"2024-01-17T07:18:56.331Z","dependency_job_id":"536bb105-5b0d-4978-b581-b0bd288ea082","html_url":"https://github.com/kong36088/ZhihuSpider","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kong36088%2FZhihuSpider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kong36088%2FZhihuSpider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kong36088%2FZhihuSpider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kong36088%2FZhihuSpider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kong36088","download_url":"https://codeload.github.com/kong36088/ZhihuSpider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226643851,"owners_count":17662967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","multi-threading","python","python3","spider","zhihu"],"created_at":"2024-08-07T21:01:23.057Z","updated_at":"2024-11-26T23:30:48.453Z","avatar_url":"https://github.com/kong36088.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# ZhihuSpider\nUser spider for www.zhihu.com\n\n## 1.Install python3 and packages\nMake sure you have installed python3.\nUsing pip to install dependencies.\n``` bash\npip install Image requests beautifulsoup4 html5lib redis PyMySQL \n```\n## 2.Database Config\nInstall `mysql`,create your database.\nImport `init.sql` to create your table.\n\n## 3.Install redis\n``` bash\n# (ubuntu)\napt-get install redis\n\n# or (centos)\n\nyum install redis\n\n# or (macos)\nbrew install redis\n```\n## 4.Config your application\nComplete config.ini\n\n## 5.Get start\n``` bash\npython get_user.py\n\n# or command python3\n\npython3 get_user.py\n```\n\n## 中文\n\n在我的博客里有代码的详细解读：[我用python爬了知乎一百万用户的数据](http://www.jwlchina.cn/2016/11/04/%E6%88%91%E7%94%A8python%E7%88%AC%E4%BA%86%E7%9F%A5%E4%B9%8E%E4%B8%80%E7%99%BE%E4%B8%87%E7%94%A8%E6%88%B7%E7%9A%84%E6%95%B0%E6%8D%AE/)\n\n数据统计分析：[百万知乎用户数据分析](http://zhihu.jwlchina.cn/)\n# 这是一个多线程抓取知乎用户的程序\n\n# Requirements\n\n需要用到的包：\n`beautifulsoup4`\n`html5lib`\n`image`\n`requests`\n`redis`\n`PyMySQL`\n\npip安装所有依赖包：\n``` bash\npip install Image requests beautifulsoup4 html5lib redis PyMySQL \n```\n\n运行环境需要支持中文\n\n测试运行环境python3.5，不保证其他运行环境能完美运行\n\n1.**需要安装mysql和redis**\n\n2.**配置`config.ini`文件，设置好mysql和redis，并且填写你的知乎帐号（master分支新版爬虫不需要登陆，但是可能会有时效问题，可以切换至new-ui分支使用）**\n\n可以通过配置`config.ini`文件下的`[sys]` `sleep_time` 控制爬虫速度（尽量使用推荐值，过快容易被知乎封禁），`thread_num`配置线程数目\n\n3.**向数据库导入`init.sql`**\n\n# Run\n\n开始抓取数据:`python get_user.py`\n查看抓取数量:`python check_redis.py`\n\n# 效果\n![效果图1](http://www.jwlchina.cn/uploads/%E7%9F%A5%E4%B9%8E%E7%94%A8%E6%88%B7%E7%88%AC%E8%99%AB4.png)\n![效果图2](http://www.jwlchina.cn/uploads/%E7%9F%A5%E4%B9%8E%E7%94%A8%E6%88%B7%E7%88%AC%E8%99%AB5.png)\n\n# Docker\n\n嫌麻烦的可以参考一下我用docker简单的搭建一个基础环境：\nmysql和redis都是官方镜像\n```bash\ndocker run --name mysql -itd mysql:latest\ndocker run --name redis -itd redis:latest\n```\n\n\n再利用docker-compose运行python镜像，我的python的docker-compose.yml：\n``` bash\npython:\n    container_name: python\n    build: .\n    ports:\n      - \"84:80\"\n    external_links:\n      - memcache:memcache\n      - mysql:mysql\n      - redis:redis\n    volumes:\n      - /docker_containers/python/www:/var/www/html\n    tty: true\n    stdin_open: true\n    extra_hosts:\n      - \"python:192.168.102.140\"\n    environment:\n        PYTHONIOENCODING: utf-8\n```\n我的Dockerfile：\n``` bash\nFrom kong36088/zhihu-spider:latest\n```\n\n# 捐赠\n\n您的支持是对我的最大鼓励！\n谢谢你请我吃糖\n![wechatpay](http://blog-image.jwlchina.cn/kong36088/kong36088.github.io/master/uploads/site/wechat-pay.png)\n![alipay](http://blog-image.jwlchina.cn/kong36088/kong36088.github.io/master/uploads/site/zhifubao.jpg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkong36088%2FZhihuSpider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkong36088%2FZhihuSpider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkong36088%2FZhihuSpider/lists"}