{"id":20470037,"url":"https://github.com/budali/articalproject","last_synced_at":"2025-04-13T10:42:55.917Z","repository":{"id":37778937,"uuid":"146388188","full_name":"budaLi/ArticalProject","owner":"budaLi","description":"爬虫的一些小项目,。欢迎star。","archived":false,"fork":false,"pushed_at":"2022-12-07T23:52:16.000Z","size":23471,"stargazers_count":17,"open_issues_count":16,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-27T02:07:55.078Z","etag":null,"topics":["python","scrapy","spiders"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/budaLi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-28T03:40:53.000Z","updated_at":"2024-01-20T09:08:07.000Z","dependencies_parsed_at":"2023-01-24T01:01:02.159Z","dependency_job_id":null,"html_url":"https://github.com/budaLi/ArticalProject","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/budaLi%2FArticalProject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/budaLi%2FArticalProject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/budaLi%2FArticalProject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/budaLi%2FArticalProject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/budaLi","download_url":"https://codeload.github.com/budaLi/ArticalProject/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248701975,"owners_count":21148111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","scrapy","spiders"],"created_at":"2024-11-15T14:11:27.788Z","updated_at":"2025-04-13T10:42:55.895Z","avatar_url":"https://github.com/budaLi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"爬虫集合\n==============\nscrapy爬虫的一些小项目。\n更新了数据库文件，需要在自己本地建对应的数据库进行配置，运行对应的sql文件即可。\n由于已经配置好了请求头伪造和ip更换，下面的项目无特殊说明均是在scrapy基础上的，请您在有一定的scrapy基础上使用该项目。\n\n#伯乐在线爬虫\n[伯乐](https://github.com/152056208/ArticalProject/blob/master/ArticalProject/spiders/jobble.py)\n#存储图片时需要在settings中设置pipeline 取消注释即可\n\n\n#知乎爬虫\n[知乎](https://github.com/152056208/ArticalProject/blob/master/ArticalProject/spiders/zhilian.py)\n#有对应的问题爬虫和答案爬虫，登陆时使用selenium登陆，需耐心等待。\n\n#腾讯视频爬虫\n[腾讯视频](https://github.com/152056208/ArticalProject/blob/master/ArticalProject/spiders/movie.py)\n#爬取腾讯视频，并使用第三方视频播放地址拼接播放地址，会员视频也可以看的哦\n[福利](http://yun.baiyug.cn/)\n腾讯，爱奇艺各大视频网站视频均可以解析,会员视频免费看~~~\n\n\n#实习僧爬虫\n[实习僧](https://github.com/152056208/ArticalProject/blob/master/ArticalProject/spiders/shixiseng.py)\n#爬取实习僧网站的招聘信息，不过职位好像比其他招聘网站少\n#发现实习僧网站对显示的数字和字体做了一定的加密，有时需要自己更改对应的字典信息。在这里修改。\n[配置对应字体](https://github.com/budaLi/ArticalProject/blob/master/ArticalProject/utls/common.py)\n\n如图:\n\n![Image text](https://github.com/budaLi/ArticalProject/blob/master/tools/QM%40DG1O~%245XOKP127WXI4%7DJ.png)\n\n\n\n#拉钩网爬虫\n[拉钩](https://github.com/152056208/ArticalProject/blob/master/ArticalProject/spiders/lagou.py)\n\n#爬取西刺免费ip代理\n[西刺](https://github.com/152056208/ArticalProject/blob/master/tools/crawl_xici_ip.py)\n#还是挺好用的，先用自己的ip爬几个ip，然后暂停，再次运行即可使用爬取的ip再次爬取,注意不要用自己ip爬取太多次，不然会被封\n\n#美女写真图片 \n[美女写真](https://github.com/budaLi/ArticalProject/blob/master/ArticalProject/spiders/meizi_pic.py)\n#能爬5000张左右\n\n#小说爬取\n[小说](https://github.com/budaLi/ArticalProject/blob/master/ArticalProject/spiders/xiaoshuo.py)\n#佛曰不可说，别举报我\n\n#qq好友爬虫\n[qq好友爬虫](https://github.com/budaLi/ArticalProject/blob/master/tools/get_qq.py)\n#抓取自己的所有qq好友信息，将对应信息入库，方便以后对空间说说进行爬取或者分析好友关系等。\n\n#bilibili用户爬虫\n[bilibili用户爬虫](https://github.com/budaLi/ArticalProject/tree/master/bilibili-user-master)\n#发现B站的用户id是从1开始的，然后自己穷举，可以在文件中设置要爬取的id范围，由于此文件是clone别人的，请求头伪造和ip并没有使用scrapy中配置好的信息。\n\n#github模拟登陆\n[github模拟登陆](https://github.com/budaLi/ArticalProject/blob/master/tools/github%E7%99%BB%E9%99%86.py)\n#抱着坦白从宽的原则，在这里沉重道歉，以为自己发现了star的漏洞，刷了几十个star不久就全给消灭了，正所谓道高一尺魔高一丈，我服了。。老老实实敲自己的代码吧\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbudali%2Farticalproject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbudali%2Farticalproject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbudali%2Farticalproject/lists"}