{"id":19341619,"url":"https://github.com/wangsudo/scrapy_smart_community","last_synced_at":"2025-06-30T14:07:44.258Z","repository":{"id":217676998,"uuid":"149544059","full_name":"Wangsudo/scrapy_smart_community","owner":"Wangsudo","description":"使用scrapy爬取了一个社区的门户网页的新闻，公告信息。","archived":false,"fork":false,"pushed_at":"2018-09-20T03:46:01.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-06T11:44:38.239Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Wangsudo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-09-20T03:09:41.000Z","updated_at":"2018-10-09T06:12:22.000Z","dependencies_parsed_at":"2024-01-17T22:06:12.837Z","dependency_job_id":"0a985bfb-f84e-483a-8f83-95bcf73a560e","html_url":"https://github.com/Wangsudo/scrapy_smart_community","commit_stats":null,"previous_names":["wangsudo/scrapy_smart_community"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wangsudo%2Fscrapy_smart_community","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wangsudo%2Fscrapy_smart_community/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wangsudo%2Fscrapy_smart_community/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wangsudo%2Fscrapy_smart_community/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Wangsudo","download_url":"https://codeload.github.com/Wangsudo/scrapy_smart_community/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240449214,"owners_count":19803120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T03:31:43.176Z","updated_at":"2025-02-24T08:41:35.668Z","avatar_url":"https://github.com/Wangsudo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# scrapy_smart_community\n使用scrapy爬取了一个社区的门户网页的新闻，公告信息。\n\n## 环境\nmacos 10.13\n\npyton 2.7.5\n\nscrapy 1.5.1\n\nmysql Ver 14.14 Distrib 5.7.19\n\n## 实现功能\n1. 资讯 递归 爬取 （已完成）\n  \n2. 爬取资讯 分别 入库 （已完成）\n\n3. 图片下载，上传oss （待开发）\n\n4. 去重url爬取 （待升级：目前只是简单进行入库去重，可以利用redis进行分布式去重url爬取）\n\n## 运行\n首先 运行环境要有 scrapy  \n下载scrapy (可能要升级 pip，按照提示升级pip) \n```\npip install scrapy\n```\n\nclone下项目\n```\ngit clone ~~~~~~~~~\n```\n修改setting.py文件下的数据库配置\n\n进入 scrapy_smart_community 文件夹下\n```\nscrapy crawl dynamic\n```\n\n## 定时运行\n这里介绍\n使用 crontab\n\n```\ncrontal -e\n```\n在vi中写类似如下crontab的指令：\n30 17 * * * cd [项目路径] \u0026\u0026 /usr/local/bin/scrapy crawl xxx\n\n## 博客详解\n（待总结）\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangsudo%2Fscrapy_smart_community","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwangsudo%2Fscrapy_smart_community","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangsudo%2Fscrapy_smart_community/lists"}