{"id":16862309,"url":"https://github.com/dave/scrapy","last_synced_at":"2025-10-15T17:26:50.109Z","repository":{"id":57554258,"uuid":"146459806","full_name":"dave/scrapy","owner":"dave","description":"Web scraper test project","archived":false,"fork":false,"pushed_at":"2018-08-30T16:01:48.000Z","size":92,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-18T15:56:47.270Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dave.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-28T14:27:50.000Z","updated_at":"2018-09-02T12:03:44.000Z","dependencies_parsed_at":"2022-09-26T18:51:16.994Z","dependency_job_id":null,"html_url":"https://github.com/dave/scrapy","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/dave/scrapy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave%2Fscrapy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave%2Fscrapy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave%2Fscrapy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave%2Fscrapy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dave","download_url":"https://codeload.github.com/dave/scrapy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dave%2Fscrapy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279095912,"owners_count":26102442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T14:35:09.622Z","updated_at":"2025-10-15T17:26:50.068Z","avatar_url":"https://github.com/dave.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/dave/scrapy.svg?branch=master)](https://travis-ci.org/dave/scrapy) \n[![Go Report Card](https://goreportcard.com/badge/github.com/dave/scrapy)](https://goreportcard.com/report/github.com/dave/scrapy) \n[![codecov](https://codecov.io/gh/dave/scrapy/branch/master/graph/badge.svg)](https://codecov.io/gh/dave/scrapy)\n\n# A simple web scraper\n\n### Install\n\n```\ngo get -u github.com/dave/scrapy\n```\n\n### Usage\n\n```\nscrapy [url]\n```\n\nThe `scrapy` command will get get the page at `url`, parse it for links and get all pages that are \non the same domain.\n\nSome stats will be outputted during the processing, and a list of URLs will be printed when it's \nfinished. You can end the job early with Ctrl+C.\n\n### Flags\n\nSeveral command line flags are available:\n\n```\n  -length int\n    \tLength of the queue (default 1000)\n  -timeout int\n    \tRequest timeout in ms (default 10000)\n  -url string\n    \tThe start page (default \"https://monzo.com\")\n  -workers int\n    \tNumber of concurrent workers (default 5)\n```\n\n### Library\n\nThis scraper can also be used as a library. See the [scraper](https://godoc.org/github.com/dave/scrapy/scraper) package.\n\n### Notes\n\nSee [here](https://github.com/dave/scrapy/blob/master/NOTES.md) for design notes and brainstorming.\n\n### Example output\n\n```\nSummary\n-------\nQueued        46\nIn progress   5   https://monzo.com/blog/2018/08/30/manage-your-bills\nSuccess       22\nErrors        0   \n\nLatency\n-------\n   0 - 100  ***\n 100 - 200 \n 200 - 300 \n 300 - 400  **************************\n 400 - 500  ******************************\n 500 - 600  ***************\n 600 - 700  ***\n 700 - 800  ***\n 800 - 900 \n 900 - 1000\n1000 - 1100\n1100 - 1200\n1200 - 1300\n1300 - 1400\n1400 - 1500\n1500 - 1600\n1600 - 1700\n1700 - 1800\n1800 - 1900\n1900 - 2000\n2000+ \n\nURLs\n----\nhttps://monzo.com\nhttps://monzo.com/-play-store-redirect\nhttps://monzo.com/about\nhttps://monzo.com/blog\nhttps://monzo.com/blog/2018/07/02/publishing-our-2018-annual-report\nhttps://monzo.com/blog/2018/07/10/making-quarterly-goals-public\nhttps://monzo.com/blog/2018/07/25/monzo-reliability-report\nhttps://monzo.com/blog/how-money-works\nhttps://monzo.com/blog/latest\n\n...\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdave%2Fscrapy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdave%2Fscrapy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdave%2Fscrapy/lists"}