{"id":21767822,"url":"https://github.com/mouday/spider-admin-pro","last_synced_at":"2025-05-14T18:04:43.560Z","repository":{"id":39593412,"uuid":"332602862","full_name":"mouday/spider-admin-pro","owner":"mouday","description":"spider-admin-pro 一个集爬虫Scrapy+Scrapyd爬虫项目查看 和 爬虫任务定时调度的可视化管理工具，SpiderAdmin的升级版","archived":false,"fork":false,"pushed_at":"2024-11-10T04:22:00.000Z","size":2959,"stargazers_count":587,"open_issues_count":6,"forks_count":84,"subscribers_count":11,"default_branch":"v3.0","last_synced_at":"2025-04-10T04:53:47.145Z","etag":null,"topics":["python3","scrapy","scrapyd","spider"],"latest_commit_sha":null,"homepage":"https://mouday.github.io/spider-admin-pro/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mouday.png","metadata":{"files":{"readme":"README-v1.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-25T01:56:48.000Z","updated_at":"2025-04-08T01:31:31.000Z","dependencies_parsed_at":"2024-03-08T02:42:36.773Z","dependency_job_id":"2a5abbab-42f3-4f04-85d9-a00f7200918b","html_url":"https://github.com/mouday/spider-admin-pro","commit_stats":{"total_commits":79,"total_committers":1,"mean_commits":79.0,"dds":0.0,"last_synced_commit":"8be96c934777f36af4aef4d269993d5e82352dda"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mouday%2Fspider-admin-pro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mouday%2Fspider-admin-pro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mouday%2Fspider-admin-pro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mouday%2Fspider-admin-pro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mouday","download_url":"https://codeload.github.com/mouday/spider-admin-pro/tar.gz/refs/heads/v3.0","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254198514,"owners_count":22030965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python3","scrapy","scrapyd","spider"],"created_at":"2024-11-26T13:30:28.452Z","updated_at":"2025-05-14T18:04:38.551Z","avatar_url":"https://github.com/mouday.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spider Admin Pro V1.0\n\n![PyPI](https://img.shields.io/pypi/v/spider-admin-pro.svg)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/spider-admin-pro)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/spider-admin-pro)\n![PyPI - License](https://img.shields.io/pypi/l/spider-admin-pro)\n\n\nGithub: [https://github.com/mouday/spider-admin-pro](https://github.com/mouday/spider-admin-pro)\n\nGitee: [https://gitee.com/mouday/spider-admin-pro](https://gitee.com/mouday/spider-admin-pro)\n\nPypi: [https://pypi.org/project/spider-admin-pro](https://pypi.org/project/spider-admin-pro)\n\n- [目录](#spider-admin-pro)\n  * [简介](#简介)\n  * [安装启动](#安装启动)\n  * [配置参数](#----)\n  * [部署优化](#----)\n  * [使用扩展](#----)\n  * [技术栈](#----)\n  * [项目结构](#----)\n  * [经验总结](#----)\n  * [TODO](#todo)\n  * [项目赞助](#----)\n  * [交流沟通](#----)\n  * [项目截图](#----)\n\n## 简介\n\nSpider Admin Pro 是[Spider Admin](https://github.com/mouday/SpiderAdmin)的升级版\n\n1. 简化了一些功能；\n2. 优化了前端界面，基于Vue的组件化开发；\n3. 优化了后端接口，对后端项目进行了目录划分；\n4. 整体代码利于升级维护。\n5. 目前仅对Python3进行了支持\n\n![](https://github.com/mouday/spider-admin-pro/raw/master/doc/img/spider-admin-pro.png)\n\n## 安装启动\n\n本项目基于Python3.7.0 开发，所以推荐使用Python3.7.0及其以上版本\n\n\u003e 注意：python3.10版本，库collections 停用了，所以不能运行该项目\n\n运行项目前，请先确保[scrapyd](https://pengshiyu.blog.csdn.net/article/details/79842514)服务已经启动\n\n方式一：\n\n```bash\n$ pip3 install spider-admin-pro\n\n$ python3 -m spider_admin_pro.run\n```\n\n方式二：(推荐，可能由于PIP新版本未及时发布，github代码会保持最新)\n```bash\n$ git clone https://github.com/mouday/spider-admin-pro.git\n\n$ cd spider-admin-pro\n\n# 安装依赖（建议：最好新建一个虚拟环境）\n$ pip3 install -r requirements.txt \n\n# 以生产模式运行\n$ python3 spider_admin_pro/run.py\n\n# 以开发模式运行\n$ python3 dev.py\n\n```\n\n\u003e Windows系统环境变量中可能没有`python3`,可以试试`python dev.py`。感谢[@whobywind](https://github.com/whobywind)\n\n## 配置参数\n\n配置优先级：\n```\nyaml配置文件 \u003e  env环境变量 \u003e 默认配置 \n```\n\n1、默认配置\n\n```bash\n\n# flask 服务配置\nPORT = 5002\nHOST = '127.0.0.1'\n\n# 登录账号密码\nUSERNAME = admin\nPASSWORD = \"123456\"\nJWT_KEY = FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=\n\n# token过期时间，单位天\nEXPIRES = 7\n\n# scrapyd地址, 结尾不要加斜杆\nSCRAPYD_SERVER = 'http://127.0.0.1:6800'\n\n# 调度器 调度历史存储设置\n# mysql or sqlite and other, any database for peewee support\nSCHEDULE_HISTORY_DATABASE_URL = 'sqlite:///dbs/schedule_history.db'\n\n# 调度器 定时任务存储地址\nJOB_STORES_DATABASE_URL = 'sqlite:///dbs/apscheduler.db'\n\n# 日志文件夹\nLOG_DIR = 'logs'\n```\n\n2、env环境变量\n\n在运行目录新建 `.env` 环境变量文件，默认参数如下\n\n注意：为了与其他环境变量区分，使用`SPIDER_ADMIN_PRO_`作为变量前缀\n\n如果使用`python3 -m` 运行，需要将变量加入到环境变量中，运行目录下新建文件`env.bash`\n\n注意，此时等号后面不可以用空格\n\n```bash\n# flask 服务配置\nexport SPIDER_ADMIN_PRO_PORT=5002\nexport SPIDER_ADMIN_PRO_HOST='127.0.0.1'\n\n# 登录账号密码\nexport SPIDER_ADMIN_PRO_USERNAME='admin'\nexport SPIDER_ADMIN_PRO_PASSWORD='123456'\nexport SPIDER_ADMIN_PRO_JWT_KEY='FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0='\n\n```\n\n增加环境变量后运行\n```bash\n$ source env.bash\n\n$ python3 -m spider_admin_pro.run\n\n```\n\n[注意]：\n\n为了简化配置复杂度，方式2：env环境变量，计划将在下一版本移除\n\n3、自定义配置\n\n在运行目录下新建`config.yml` 文件，运行时会自动读取该配置文件\n\neg:\n\n```yaml\n# flask 服务配置\nPORT: 5002\nHOST: '127.0.0.1'\n\n# 登录账号密码\nUSERNAME: admin\nPASSWORD: \"123456\"\nJWT_KEY: \"FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=\"\n\n# token过期时间，单位天\nEXPIRES: 7\n\n# scrapyd地址, 结尾不要加斜杆\nSCRAPYD_SERVER: \"http://127.0.0.1:6800\"\n\n# 日志文件夹\nLOG_DIR: 'logs'\n```\n\n生成jwt key\n```\n$ python -c 'import base64;import os;print(base64.b64encode(os.urandom(32)).decode())'\n```\n\n## 部署优化\n\n1、使用 Gunicorn管理应用\n\nGunicorn文档：[https://docs.gunicorn.org/](https://docs.gunicorn.org/)\n\n```bash\n# 启动服务\n$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app\n```\n\n注意： \n\n如果使用了 `Gunicorn` 那么 配置文件中的 `PORT` 和 `HOST` 将会不生效\n\n如果需要修改port 和host, 需要修改`gunicorn.conf.py` 文件中的 `bind`\n \n一个配置示例：gunicorn.conf.py\n\n```python\n# -*- coding: utf-8 -*-\n\n\"\"\"\n$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app\n\"\"\"\n\nimport multiprocessing\nimport os\n\nfrom gevent import monkey\n\nmonkey.patch_all()\n\n# 日志文件夹\nLOG_DIR = 'logs'\n\nif not os.path.exists(LOG_DIR):\n    os.mkdir(LOG_DIR)\n\n\ndef resolve_file(filename):\n    return os.path.join(LOG_DIR, filename)\n\n\ndef get_workers():\n    return multiprocessing.cpu_count() * 2 + 1\n\n\n# daemon = True\ndaemon = False  # 使用supervisor不能是后台进程\n\n# 进程名称\nproc_name = \"spider-admin-pro\"\n\n# 启动端口\nbind = \"127.0.0.1:5001\"\n\n# 日志文件\nloglevel = 'debug'\npidfile = resolve_file(\"gunicorn.pid\")\naccesslog = resolve_file(\"access.log\")\nerrorlog = resolve_file(\"error.log\")\n\n# 启动的进程数\n# workers = get_workers()\nworkers = 2\nworker_class = 'gevent'\n\n\n# 启动时钩子\ndef on_starting(server):\n    ip, port = server.address[0]\n    print('server.address:', f'http://{ip}:{port}')\n\n```\n\n注意：\n\n使用gunicorn部署，会启动多个worker, 这样apscheduler会启动多个，可能会出现重复运行的情况（暂时没出现）\n\n这种情况下，调度器控制开关不要动，以免启动不了；如果出现了定时任务不执行，可尝试重启整个服务\n\n\n2、使用supervisor管理进程\n\n文档：[http://www.supervisord.org](http://www.supervisord.org)\n\nspider-admin-pro.ini\n\n```ini\n[program: spider-admin-pro]\ndirectory=/spider-admin-pro\ncommand=/usr/local/python3/bin/gunicorn --config gunicorn.conf.py spider_admin_pro.run:app\n\nstdout_logfile=logs/out.log\nstderr_logfile=logs/err.log\n\nstdout_logfile_maxbytes = 20MB\nstdout_logfile_backups = 0\nstderr_logfile_maxbytes=10MB\nstderr_logfile_backups=0\n```\n\n3、使用Nginx转发请求\n\n```bash\nserver {\n    listen 80;\n\n    server_name _;\n\n    access_log  /var/log/nginx/access.log;\n    error_log  /var/log/nginx/error.log;\n\n    location / {\n        proxy_pass         http://127.0.0.1:5001/;\n        proxy_redirect     off;\n\n        proxy_set_header   Host                 $host;\n        proxy_set_header   X-Real-IP            $remote_addr;\n        proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;\n        proxy_set_header   X-Forwarded-Proto    $scheme;\n    }\n}\n\n```\n\n## 使用扩展\n\n收集运行日志：[scrapy-util](https://github.com/mouday/scrapy-util) 可以帮助你收集到程序运行的统计数据\n\n\n## 技术栈：\n1、前端技术：\n\n|  功能 | 第三方库及文档  |  \n| - | -  | \n| 基本框架 | [vue](https://cn.vuejs.org/)  |\n| 仪表盘图表 | [echarts](https://echarts.apache.org/)  |\n| 网络请求 | [axios](https://www.npmjs.com/package/axios)  |\n| 界面样式 | [Element-UI](https://element.eleme.cn/)  |\n\n\n2、后端技术\n\n| 功能 | 第三方库及文档 |\n| - | -  |\n| 接口服务 | [Flask](https://dormousehole.readthedocs.io/) |\n| 任务调度 | [apscheduler](https://apscheduler.readthedocs.io/) |\n| scrapyd接口 | [scrapyd-api](https://github.com/mouday/scrapyd-api) |\n| 网络请求 | [session-request](https://github.com/mouday/session-request) |\n| ORM | [peewee](http://docs.peewee-orm.com/) |\n| jwt | [jwt](https://pyjwt.readthedocs.io/) |\n| 系统信息 | [psutil](https://psutil.readthedocs.io/) |\n\n## 项目结构\n\n【公开仓库】基于Flask的后端项目spider-admin-pro: [https://github.com/mouday/spider-admin-pro](https://github.com/mouday/spider-admin-pro)\n\n【私有仓库】基于Vue的前端项目spider-admin-pro-web: [https://github.com/mouday/spider-admin-pro-web](https://github.com/mouday/spider-admin-pro-web)\n\n\nspider-admin-pro项目主要目录结构：\n\n```bash\n.\n├── run.py        # 程序入口\n├── api           # Controller层\n├── service       # Sevice层\n├── model         # Model层\n├── exceptions    # 异常 \n├── utils         # 工具类\n└── web           # 静态web页\n\n```\n\n## 经验总结\n\nScrapyd 不能直接暴露在外网\n\n1. 其他人通过deploy部署可以将代码部署到你的机器上，如果是root用户运行，还会在你机器上做其他的事情\n2. 还有运行日志中会出现配置文件中的信息，存在信息泄露的危险\n\n\n## TODO\n\n~~1. 补全开发文档~~\n\n~~2. 支持命令行安装可用~~\n\n~~3. 优化代码布局，提取公共库~~\n\n~~4. 日志自动刷新~~\n\n~~5. scrapy项目数据收集~~\n\n[ok]6. 定时任务spider列左对齐，支持本地排序\n\n[x]7. 调度器控制移除停止开启开关，只保留暂停继续\n\n[x]8. 添加任务，默认项目名，关闭弹框取消form校验结果\n\n[x]9. 统计的日志量太大，增加一个一个定时清理的功能\n\n[x]10. 定时任务备份，不小心把任务清空\n\n[x]11. 希望能加入更好的定时方式,类似 scrapyd_web那种定时\n\n[x]12. 简单的爬虫不用非要去打包，比如我自己上传一个py文件，可以定时任务，脚本的方式运行\n\n## 交流沟通\n\n关注本项目的小伙伴越来越多，为了更好地交流沟通，可以加入群聊\n\n问题：邀请码 答案：SpiderAdmin\n\n\u003cimg src=\"https://github.com/mouday/spider-admin-pro/raw/master/doc/img/qq.jpg\" width=\"300\"/\u003e\n\n## 项目赞助\n\n| 日期 | 姓名 | 金额 | \n| - | - | - |\n| 2022-04-16 | [@realhellosunsun](https://github.com/realhellosunsun) | ￥188.00\n| 2022-08-30 | [@yangxiaozhe13](https://github.com/yangxiaozhe13) | ￥88.00\n| 2022-09-01 | [@robot-2233](https://github.com/robot-2233) | ￥88.00\n\n## 项目截图\n\n![](https://github.com/mouday/spider-admin-pro/raw/master/doc/img/dashboard.png)\n\n![](https://github.com/mouday/spider-admin-pro/raw/master/doc/img/project.png)\n\n![](https://github.com/mouday/spider-admin-pro/raw/master/doc/img/schedule.png)\n\n![](https://github.com/mouday/spider-admin-pro/raw/master/doc/img/logs.png)\n\n\n## 二次开发\n\n```bash\ngit clone https://github.com/mouday/spider-admin-pro.git\n\ncd spider-admin-pro\n\npython3 dev.py\n```\n\n## 安装升级\n```\npip3 install -U spider-admin-pro -i https://pypi.org/simple\n```\n\n## 更新日志\n\n1. 2021-09-03 [bugfix]修复【任务列表】运行中项目无法取消的bug\n\n2. 2022-04-01 [bugfix] 当修改scrapyd的端口号后，在配置文件中指定scrapyd为修改后的端口号。配置文件不生效\n\n感谢：@洒脱的狂者 发现的问题及解决办法\n\n2. 2022-05-27 [update] requirements.txt 文件中增加 flask_cors 依赖\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/mouday/spider-admin-pro.svg)](https://starchart.cc/mouday/spider-admin-pro)\n\n\n社区其他优秀工具推荐\n\n- https://github.com/DormyMo/SpiderKeeper\n- https://github.com/my8100/scrapydweb\n- https://github.com/ouqiang/gocron 使用Go语言开发的轻量级定时任务集中调度和管理系统, 用于替代Linux-crontab\n\n## 其他问题\n\n1、windows系统 scrapyd 启动失败，可能缺少依赖pywin32\n\n```\npip install pywin32\n```\n\n感谢[@whobywind](https://github.com/whobywind)，提供的解决方案\n\n2、网站有ip校验，刚访问几个请求就被禁止访问？\n\n同一个ip可能有被封的风险，可以使用代理ip去请求，有免费和付费。\n\n如果是个人使用，可以找一些免费的ip临时使用\n\n如果是企业项目，可以使用付费代理ip\n\n某爬虫大佬也推荐过一个不错的动态代理 [云立方](http://www.yunlifang.cn/?from=spider-admin-pro)\n\n\u003ca href=\"http://www.yunlifang.cn/?from=spider-admin-pro\" target=\"_blank\" style=\"display: inline-block; background-color: #000;\"\u003e\n\u003cimg src=\"https://www.yunlifang.cn/img/logo.png\"\u003e\n\u003c/a\u003e\n\n找客服发送暗号：【爬虫推广】可以获取打折优惠\n\n具体搭建方法在大佬的博客中有详尽说明：\n\n[使用 Tornado+Redis 维护 ADSL 拨号服务器代理池](https://cuiqingcai.com/4596.html)\n\n如果有问题，可以加QQ群，群里的小伙伴会积极解答喔\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmouday%2Fspider-admin-pro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmouday%2Fspider-admin-pro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmouday%2Fspider-admin-pro/lists"}