{"id":18551919,"url":"https://github.com/bepb/web_crawler","last_synced_at":"2026-01-24T18:02:00.696Z","repository":{"id":41431404,"uuid":"466060687","full_name":"BEPb/web_crawler","owner":"BEPb","description":"fast and symple web crawler","archived":false,"fork":false,"pushed_at":"2022-03-06T19:58:33.000Z","size":319,"stargazers_count":53,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T05:31:50.675Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BEPb.png","metadata":{"files":{"readme":"README.chinese.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-04T09:26:49.000Z","updated_at":"2024-12-18T07:03:52.000Z","dependencies_parsed_at":"2022-09-21T08:53:54.029Z","dependency_job_id":null,"html_url":"https://github.com/BEPb/web_crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BEPb/web_crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BEPb%2Fweb_crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BEPb%2Fweb_crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BEPb%2Fweb_crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BEPb%2Fweb_crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BEPb","download_url":"https://codeload.github.com/BEPb/web_crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BEPb%2Fweb_crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28733301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T17:51:25.893Z","status":"ssl_error","status_checked_at":"2026-01-24T17:50:48.377Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T21:11:05.082Z","updated_at":"2026-01-24T18:02:00.681Z","avatar_url":"https://github.com/BEPb.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Profile views](https://gpvc.arturio.dev/BEPb) \n![GitHub top language](https://img.shields.io/github/languages/top/BEPb/web_crawler) \n![GitHub language count](https://img.shields.io/github/languages/count/BEPb/web_crawler)\n![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/BEPb/web_crawler)\n![GitHub repo size](https://img.shields.io/github/repo-size/BEPb/web_crawler) \n![GitHub](https://img.shields.io/github/license/BEPb/web_crawler) \n![GitHub last commit](https://img.shields.io/github/last-commit/BEPb/web_crawler)\n![GitHub User's stars](https://img.shields.io/github/stars/BEPb?style=social)\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"https://visitor-badge.laobi.icu/badge?page_id=BEPb.github-contributions\" alt=\"visitors\"/\u003e\n\u003c/p\u003e\n\n\n![](./example/i_l_p.png)\n\n\nRead this in other languages: [Russian](README.ru.md), [हिन्दी](README.hindi.md), [English](README.md)\n\n\n\u003cdiv align=\"center\"\u003e\n\n\n\u003cimg src=\"img/web_crawler_header.jpg\" alt=\"Bot logo\" width=\"800\" height=\"156.5\"\u003e\n\n# 快速簡單的爬蟲\n\u003c/div\u003e\n\n## 這個怎麼運作？\n這很簡單：您的機器人大量訂閱您的帳戶作為回應，人們訂閱您。\n# 準備和使用機器人的順序\n. 克隆存儲庫或從 github 下載存檔或在命令行上使用以下命令\n   ```commandline\n   $ cmd\n   $ git clone https://github.com/BEPb/github_bot\n   $ cd github_bot\n   ```\n2. 創建 Python 虛擬環境。\n3. 使用以下命令為我們的代碼安裝所有必要的包：\n\n```commandline\n        pip install -r requirements.txt\n```\n\n\n4.創建一個名為nameproject的項目\n```commandline\nscrapy startproject nameproject\n```\n\n5. 之後，您將擁有一個帶有該項目名稱的文件夾，其中包含最少的必要文件和依賴項\n```commandline\n\n    scrapy.cfg    # deploy 配置文件\n    nameproject/  # 項目的 Python 模塊，你將從這裡導入你的代碼\n        __init__.py\n        items.py        # 項目項目定義文件\n        middlewares.py  # 項目中間件文件\n        pipelines.py    # 項目管道文件\n        settings.py     # 項目設置文件\n        spiders/        # 稍後放置蜘蛛的目錄\n            __init__.py\n```\n6.進入我們的項目文件夾\n```commandline\ncd nameproject\n```\n\n7. 在 spiders/ 文件夾中創建一個quotes_spider.py 文件，並在其中寫下我們作弊的人和方式\n8. 啟動我們的爬蟲\n```commandline\nscrapy crawl quotes\n```\n9. 作為執行的結果，創建了兩個新文件：quotes-1.html 和quotes-2.html，內容為\n  對應的 URL，正如我們的 parse 方法所指定的。\n10. 使用外殼選擇器\n```commandline\nscrapy shell 'https://quotes.toscrape.com/page/1/'\n```\n11.使用css查看所有'title'對象。執行 response.css('title') 的結果類似於\n  名為 SelectorList 的列表對象，它是包裝的 Selector 對象的列表\n  XML/HTML 元素，並允許您執行其他查詢以優化選擇或檢索數據。\n```commandline\nresponse.css('title')\n```\n12.為了查看列表，指定getall()方法\n```commandline\nresponse.css('title::text').getall()\n```\n13.同樣可以用xpath做\n```命令行\nresponse.xpath('//title/text()').get()\n```\n14. 現在使用帶有類引號的 div 標籤\n```命令行\nresponse.css(\"div.quote\")\n```\n\n15. 只取列表中的第一個元素\n```命令行\nresponse.css(\"div.quote\")[0]\n```\n\n16.為了獲取標籤中的類，使用以下命令：\n```命令行\nquote.css(\"span.text::text\").get()\nquote.css(\"small.author::text\").get()\n```\n17. 這就是我們將如何顯示 div 標籤的類的完整列表\n```命令行\nresponse.css(\"div.quote\").css(\"div.tags a.tag::text\").getall()\n```\n18. 這就是我們將結果保存為 json 格式的方式，其中 `-O` 命令行開關會覆蓋任何現有的\n  文件;\n```commandline\nscrapy crawl quotes -O quotes.json\n```\n19. 這就是我們將結果保存為 csv 格式的方式\n```commandline\nscrapy crawl quotes -O quotes.csv\n```\n20.以下命令使用.jl格式逐行寫入\n```commandline\nscrapy crawl quotes -o quotes.jl\n```\n\n\u003cimg src=\"img/spyder.jpg\" alt=\"Bot logo\" width=\"800\" height=\"356.5\"\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbepb%2Fweb_crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbepb%2Fweb_crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbepb%2Fweb_crawler/lists"}