{"id":21962725,"url":"https://github.com/icio/gergle","last_synced_at":"2025-03-22T20:44:53.729Z","repository":{"id":143094765,"uuid":"51244244","full_name":"icio/gergle","owner":"icio","description":"Golang website crawler","archived":false,"fork":false,"pushed_at":"2016-03-05T13:36:48.000Z","size":50,"stargazers_count":1,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-28T00:43:24.653Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/icio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-07T10:51:22.000Z","updated_at":"2016-02-24T08:05:05.000Z","dependencies_parsed_at":"2023-04-27T21:47:16.975Z","dependency_job_id":null,"html_url":"https://github.com/icio/gergle","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/icio%2Fgergle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/icio%2Fgergle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/icio%2Fgergle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/icio%2Fgergle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/icio","download_url":"https://codeload.github.com/icio/gergle/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245020314,"owners_count":20548156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-29T10:54:17.585Z","updated_at":"2025-03-22T20:44:53.559Z","avatar_url":"https://github.com/icio.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# :dizzy: gergle\n\n`gergle` is a silly little website-scraping tool, written in Go. By no coincidence, very similar to [`crul`](http://github.com/icio/crul). It will attempt to abide by robots.txt unless you tell it otherwise, spawning a new goroutine for every request being made.\n\n\n## Installation\n\n```\ngo get github.com/icio/gergle/cmd/gergle\n```\n\n\n## Usage\n\n```\n$ gergle -h\nWebsite crawler.\n\nUsage:\n  gergle URL [flags]\n\nFlags:\n  -c, --connections int   Maximum number of open connections to the server. (default 5)\n  -t, --delay float       The number of seconds between requests to the server. (default -1)\n  -d, --depth value       Maximum crawl depth. (default 100)\n  -i, --disallow value    Disallowed paths. (default [])\n      --long              List all of the links and assets from a page.\n  -q, --quiet             No logging to stderr.\n  -v, --verbose           Verbose output logging.\n      --zero              The number of bothers to give about robots.txt.\n```\n\n\n## Examples\n\n``` bash\n# Crawl paul-scott.com with one second between each page request,\n# listing all links and assets.\n$ gergle http://www.paul-scott.com/ -t 1 --long\n\n# Crawl kirupa.com, excluding /forum*, up to three levels deep (first page is\n# depth 0), ignoring robots.txt and using up to 30 simultaneous connections.\n# 640 pages in 9 seconds on my local.\n$ gergle -q https://www.kirupa.com/ --zero -c 30 -d 3 -iforum\n```\n\n\n## Todo\n\n- [ ] Actual tests -- something beyond [manual testing](https://github.com/icio/crawler-target) :disappointed:\n- [ ] First-class tracking of redirects and canonical URLs\n- [ ] Vendoring of dependencies\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ficio%2Fgergle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ficio%2Fgergle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ficio%2Fgergle/lists"}