{"id":13531094,"url":"https://github.com/s0rg/crawley","last_synced_at":"2025-05-16T15:07:48.283Z","repository":{"id":38155737,"uuid":"421937051","full_name":"s0rg/crawley","owner":"s0rg","description":"The unix-way web crawler","archived":false,"fork":false,"pushed_at":"2025-04-26T13:54:24.000Z","size":211,"stargazers_count":293,"open_issues_count":2,"forks_count":16,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-26T14:37:05.178Z","etag":null,"topics":["cli","crawler","go","golang","golang-application","pentest","pentest-tool","pentesting","unix-way","web-crawler","web-scraping","web-spider"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/s0rg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-27T18:48:51.000Z","updated_at":"2025-04-26T13:52:02.000Z","dependencies_parsed_at":"2023-11-19T16:24:14.676Z","dependency_job_id":"0bd0295a-364f-4b13-ad13-efcd21326874","html_url":"https://github.com/s0rg/crawley","commit_stats":null,"previous_names":[],"tags_count":68,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s0rg%2Fcrawley","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s0rg%2Fcrawley/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s0rg%2Fcrawley/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s0rg%2Fcrawley/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/s0rg","download_url":"https://codeload.github.com/s0rg/crawley/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254553958,"owners_count":22090417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","crawler","go","golang","golang-application","pentest","pentest-tool","pentesting","unix-way","web-crawler","web-scraping","web-spider"],"created_at":"2024-08-01T07:00:59.941Z","updated_at":"2025-05-16T15:07:43.275Z","avatar_url":"https://github.com/s0rg.png","language":"Go","readme":"[![License](https://img.shields.io/badge/license-MIT%20License-blue.svg)](https://github.com/s0rg/crawley/blob/main/LICENSE)\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fs0rg%2Fcrawley.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fs0rg%2Fcrawley?ref=badge_shield)\n[![Go Version](https://img.shields.io/github/go-mod/go-version/s0rg/crawley)](go.mod)\n[![Release](https://img.shields.io/github/v/release/s0rg/crawley)](https://github.com/s0rg/crawley/releases/latest)\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)\n![Downloads](https://img.shields.io/github/downloads/s0rg/crawley/total.svg)\n\n[![CI](https://github.com/s0rg/crawley/workflows/ci/badge.svg)](https://github.com/s0rg/crawley/actions?query=workflow%3Aci)\n[![Go Report Card](https://goreportcard.com/badge/github.com/s0rg/crawley)](https://goreportcard.com/report/github.com/s0rg/crawley)\n[![Maintainability](https://api.codeclimate.com/v1/badges/6542cd90a6c665e4202e/maintainability)](https://codeclimate.com/github/s0rg/crawley/maintainability)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/e1c002df2b4571e01537/test_coverage)](https://codeclimate.com/github/s0rg/crawley/test_coverage)\n[![libraries.io](https://img.shields.io/librariesio/github/s0rg/crawley)](https://libraries.io/github/s0rg/crawley)\n![Issues](https://img.shields.io/github/issues/s0rg/crawley)\n\n# crawley\n\nCrawls web pages and prints any link it can find.\n\n# features\n\n- fast html SAX-parser (powered by [x/net/html](https://golang.org/x/net/html))\n- js/css lexical parsers (powered by [tdewolff/parse](https://github.com/tdewolff/parse)) - extract api endpoints from js code and `url()` properties\n- small (below 1500 SLOC), idiomatic, 100% test covered codebase\n- grabs most of useful resources urls (pics, videos, audios, forms, etc...)\n- found urls are streamed to stdout and guranteed to be unique (with fragments omitted)\n- scan depth (limited by starting host and path, by default - 0) can be configured\n- can be polite - crawl rules and sitemaps from `robots.txt`\n- `brute` mode - scan html comments for urls (this can lead to bogus results)\n- make use of `HTTP_PROXY` / `HTTPS_PROXY` environment values + handles proxy auth (use `HTTP_PROXY=\"socks5://127.0.0.1:1080/\" crawley` for socks5)\n- directory-only scan mode (aka `fast-scan`)\n- user-defined cookies, in curl-compatible format (i.e. `-cookie \"ONE=1; TWO=2\" -cookie \"ITS=ME\" -cookie @cookie-file`)\n- user-defined headers, same as curl: `-header \"ONE: 1\" -header \"TWO: 2\" -header @headers-file`\n- tag filter - allow to specify tags to crawl for (single: `-tag a -tag form`, multiple: `-tag a,form`, or mixed)\n- url ignore - allow to ignore urls with matched substrings from crawling (i.e.: `-ignore logout`)\n- subdomains support - allow depth crawling for subdomains as well (e.g. `crawley http://some-test.site` will be able to crawl `http://www.some-test.site`)\n\n# examples\n\n```sh\n# print all links from first page:\ncrawley http://some-test.site\n\n# print all js files and api endpoints:\ncrawley -depth -1 -tag script -js http://some-test.site\n\n# print all endpoints from js:\ncrawley -js http://some-test.site/app.js\n\n# download all png images from site:\ncrawley -depth -1 -tag img http://some-test.site | grep '\\.png$' | wget -i -\n\n# fast directory traversal:\ncrawley -headless -delay 0 -depth -1 -dirs only http://some-test.site\n```\n\n# installation\n\n- [binaries / deb / rpm](https://github.com/s0rg/crawley/releases) for Linux, FreeBSD, macOS and Windows.\n- [archlinux](https://aur.archlinux.org/packages/crawley-bin/) you can use your favourite AUR helper to install it, e. g. `paru -S crawley-bin`.\n\n# usage\n\n```\ncrawley [flags] url\n\npossible flags with default values:\n\n-all\n    scan all known sources (js/css/...)\n-brute\n    scan html comments\n-cookie value\n    extra cookies for request, can be used multiple times, accept files with '@'-prefix\n-css\n    scan css for urls\n-delay duration\n    per-request delay (0 - disable) (default 150ms)\n-depth int\n    scan depth (set -1 for unlimited)\n-dirs string\n    policy for non-resource urls: show / hide / only (default \"show\")\n-header value\n    extra headers for request, can be used multiple times, accept files with '@'-prefix\n-headless\n    disable pre-flight HEAD requests\n-ignore value\n    patterns (in urls) to be ignored in crawl process\n-js\n    scan js code for endpoints\n-proxy-auth string\n    credentials for proxy: user:password\n-robots string\n    policy for robots.txt: ignore / crawl / respect (default \"ignore\")\n-silent\n    suppress info and error messages in stderr\n-skip-ssl\n    skip ssl verification\n-subdomains\n    support subdomains (e.g. if www.domain.com found, recurse over it)\n-tag value\n    tags filter, single or comma-separated tag names\n-timeout duration\n    request timeout (min: 1 second, max: 10 minutes) (default 5s)\n-user-agent string\n    user-agent string\n-version\n    show version\n-workers int\n      number of workers (default - number of CPU cores)\n```\n\n# flags autocompletion\n\nCrawley can handle flags autocompletion in bash and zsh via `complete`:\n\n```bash\ncomplete -C \"/full-path-to/bin/crawley\" crawley\n```\n\n\n# license\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fs0rg%2Fcrawley.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fs0rg%2Fcrawley?ref=badge_large)\n","funding_links":[],"categories":["Utilities","软件包","Software Packages","Recon","\u003ca name=\"webdev\"\u003e\u003c/a\u003eWeb development","Go Tools","Other Software"],"sub_categories":["Calendars","其他软件","Other Software","Content Discovery"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs0rg%2Fcrawley","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fs0rg%2Fcrawley","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs0rg%2Fcrawley/lists"}