{"id":17913826,"url":"https://github.com/windfarer/biu","last_synced_at":"2025-03-23T23:30:42.605Z","repository":{"id":43367301,"uuid":"129830861","full_name":"Windfarer/biu","owner":"Windfarer","description":"biubiubiu~~ I'm a tiny web crawler framework","archived":false,"fork":false,"pushed_at":"2024-08-31T06:53:34.000Z","size":69,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-19T00:17:54.548Z","etag":null,"topics":["crawler","python","spider","spider-framework","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Windfarer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-17T01:58:09.000Z","updated_at":"2024-08-31T06:52:54.000Z","dependencies_parsed_at":"2023-01-24T16:45:16.302Z","dependency_job_id":null,"html_url":"https://github.com/Windfarer/biu","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Windfarer%2Fbiu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Windfarer%2Fbiu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Windfarer%2Fbiu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Windfarer%2Fbiu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Windfarer","download_url":"https://codeload.github.com/Windfarer/biu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245186424,"owners_count":20574550,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","python","spider","spider-framework","web-crawler"],"created_at":"2024-10-28T19:53:38.541Z","updated_at":"2025-03-23T23:30:42.336Z","avatar_url":"https://github.com/Windfarer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Biu\nA tiny web crawler framework\n\n## Features\n* 请使用 Python3.10 或更高版本\n* 并发基于 Gevent，因此你必须在脚本一开始`import biu`，或者自行 monkey patch\n* 请求基于 Requests，请求与请求结果的参数与 Requests 基本兼容\n* 页面解析基于 Parsel, 因此使用方法与 Scrapy 一致\n* 基本是一个缩水版的 Scrapy，用法与之非常类似\n* 更多高级功能请面向源代码编程，自行发掘\n\n## Installation\n```\npip install biu\n```\n\n## Example\n```python\nimport biu  ## Must be the first line, because of monkey-included.\n\n\nclass MySpider(biu.Project):\n    def start_requests(self):\n        for i in range(0, 301, 30):\n            # return 或者 yield 一个 biu.Request 就会去访问一个页面，参数与 requests 的那个基本上是兼容的\n            yield biu.Request(url=\"https://www.douban.com/group/explore/tech?start={}\".format(i),\n                              method=\"GET\",\n                              headers={\"User-Agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36\"},\n                              callback=self.parse)\n\n    def parse(self, resp):\n        ## biu.Response 和 requests 的那个差不多，加了几个选择器上去\n        for item in resp.xpath('//*[@id=\"content\"]/div/div[1]/div[1]/div'):\n            yield {\n                \"title\": item.xpath(\"div[2]/h3/a/text()\").extract_first(),\n                \"url\": item.xpath(\"div[2]/h3/a/@href\").extract_first(),\n                \"abstract\": item.css(\"p::text\").extract_first()\n            }\n            # return 或者 yield 一个 dict, 就会当作结果传到result_handler里进行处理\n\n\n    def result_handler(self, rv):\n        print(\"get result:\", rv)\n        # 在这把你的结果存了\n\nbiu.run(MySpider(concurrent=3, interval=0.2, max_retry=5))\n\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwindfarer%2Fbiu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwindfarer%2Fbiu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwindfarer%2Fbiu/lists"}