{"id":13400033,"url":"https://github.com/binux/pyspider","last_synced_at":"2025-10-05T18:31:20.021Z","repository":{"id":14357182,"uuid":"17066884","full_name":"binux/pyspider","owner":"binux","description":"A Powerful Spider(Web Crawler) System in Python.","archived":true,"fork":false,"pushed_at":"2024-04-30T19:43:29.000Z","size":4171,"stargazers_count":16526,"open_issues_count":302,"forks_count":3688,"subscribers_count":895,"default_branch":"master","last_synced_at":"2025-01-03T16:35:10.460Z","etag":null,"topics":["crawler","python"],"latest_commit_sha":null,"homepage":"http://docs.pyspider.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binux.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-02-21T19:18:47.000Z","updated_at":"2025-01-01T09:28:02.000Z","dependencies_parsed_at":"2024-06-18T14:08:30.450Z","dependency_job_id":"f293ff91-b4f8-4e94-a681-c79ae3ad4d9a","html_url":"https://github.com/binux/pyspider","commit_stats":{"total_commits":1071,"total_committers":65,"mean_commits":"16.476923076923075","dds":0.3613445378151261,"last_synced_commit":"897891cafb21ea5b4ac08e728ad2ea212879f7fa"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binux%2Fpyspider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binux%2Fpyspider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binux%2Fpyspider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binux%2Fpyspider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binux","download_url":"https://codeload.github.com/binux/pyspider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235432202,"owners_count":18989480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","python"],"created_at":"2024-07-30T19:00:46.975Z","updated_at":"2025-10-05T18:31:19.579Z","avatar_url":"https://github.com/binux.png","language":"Python","readme":"pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage]\n========\n\nA Powerful Spider(Web Crawler) System in Python.\n\n- Write script in Python\n- Powerful WebUI with script editor, task monitor, project manager and result viewer\n- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend\n- [RabbitMQ](http://www.rabbitmq.com/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue\n- Task priority, retry, periodical, recrawl by age, etc...\n- Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...\n\nTutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)  \nDocumentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)  \nRelease notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases)  \n\nSample Code \n-----------\n\n```python\nfrom pyspider.libs.base_handler import *\n\n\nclass Handler(BaseHandler):\n    crawl_config = {\n    }\n\n    @every(minutes=24 * 60)\n    def on_start(self):\n        self.crawl('http://scrapy.org/', callback=self.index_page)\n\n    @config(age=10 * 24 * 60 * 60)\n    def index_page(self, response):\n        for each in response.doc('a[href^=\"http\"]').items():\n            self.crawl(each.attr.href, callback=self.detail_page)\n\n    def detail_page(self, response):\n        return {\n            \"url\": response.url,\n            \"title\": response.doc('title').text(),\n        }\n```\n\n\nInstallation\n------------\n\n* `pip install pyspider`\n* run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/)\n\n**WARNING:** WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or [enable `need-auth` for webui](http://docs.pyspider.org/en/latest/Command-Line/#-config).\n\nQuickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/)\n\nContribute\n----------\n\n* Use It\n* Open [Issue], send PR\n* [User Group]\n* [中文问答](http://segmentfault.com/t/pyspider)\n\n\nTODO\n----\n\n### v0.4.0\n\n- [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia)\n\n\nLicense\n-------\nLicensed under the Apache License, Version 2.0\n\n\n[Build Status]:         https://img.shields.io/travis/binux/pyspider/master.svg?style=flat\n[Travis CI]:            https://travis-ci.org/binux/pyspider\n[Coverage Status]:      https://img.shields.io/coveralls/binux/pyspider.svg?branch=master\u0026style=flat\n[Coverage]:             https://coveralls.io/r/binux/pyspider\n[Try]:                  https://img.shields.io/badge/try-pyspider-blue.svg?style=flat\n[Issue]:                https://github.com/binux/pyspider/issues\n[User Group]:           https://groups.google.com/group/pyspider-users\n","funding_links":[],"categories":["Python","Web Crawling \u0026 Web Scraping","All","资源列表","Web Crawling","Web 后端","python","spider","Data Processing","Uncategorized","网络服务","Core Libraries","HTML 处理","Application Recommendation","Web Crawling [🔝](#readme)","Awesome Python","DevOps Utilities","Data"],"sub_categories":["HTML 处理","Data Pre-processing \u0026 Loading","Uncategorized","网络爬虫","Python","🤖 Automation Tools","Web Crawling \u0026 Web Scraping","Aggregators"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinux%2Fpyspider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinux%2Fpyspider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinux%2Fpyspider/lists"}