{"id":27071076,"url":"https://github.com/hahwul/urx","last_synced_at":"2025-07-02T06:09:35.933Z","repository":{"id":284880144,"uuid":"956297926","full_name":"hahwul/urx","owner":"hahwul","description":"Extracts URLs from OSINT Archives for Security Insights","archived":false,"fork":false,"pushed_at":"2025-06-18T14:09:10.000Z","size":4893,"stargazers_count":141,"open_issues_count":3,"forks_count":13,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-26T14:09:20.300Z","etag":null,"topics":["osint","osint-tool","security","url","urx","wayback-machine"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/urx","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hahwul.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"hahwul"}},"created_at":"2025-03-28T02:48:27.000Z","updated_at":"2025-06-25T07:38:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"ae75bea0-692e-4a6f-9885-c16cfa6a544a","html_url":"https://github.com/hahwul/urx","commit_stats":null,"previous_names":["hahwul/urx"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/hahwul/urx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hahwul%2Furx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hahwul%2Furx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hahwul%2Furx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hahwul%2Furx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hahwul","download_url":"https://codeload.github.com/hahwul/urx/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hahwul%2Furx/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263083730,"owners_count":23411166,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["osint","osint-tool","security","url","urx","wayback-machine"],"created_at":"2025-04-05T23:00:50.068Z","updated_at":"2025-07-02T06:09:35.924Z","avatar_url":"https://github.com/hahwul.png","language":"Rust","funding_links":["https://github.com/sponsors/hahwul"],"categories":["Weapons","Rust"],"sub_categories":["Tools"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003cimg alt=\"URX Logo\" src=\"https://raw.githubusercontent.com/hahwul/urx/refs/heads/main/docs/images/logo.png\" width=\"300px;\"\u003e\n  \u003c/picture\u003e\n  \u003cp\u003eExtracts URLs from OSINT Archives for Security Insights.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/hahwul/urx/releases/latest\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/hahwul/urx?style=for-the-badge\u0026logoColor=%23000000\u0026label=urx\u0026labelColor=%23000000\u0026color=%23000000\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://app.codecov.io/gh/hahwul/urx\"\u003e\u003cimg src=\"https://img.shields.io/codecov/c/gh/hahwul/urx?style=for-the-badge\u0026logoColor=%23000000\u0026labelColor=%23000000\u0026color=%23000000\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/hahwul/urx/blob/main/CONTRIBUTING.md\"\u003e\u003cimg src=\"https://img.shields.io/badge/CONTRIBUTIONS-WELCOME-000000?style=for-the-badge\u0026labelColor=000000\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://rust-lang.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/Rust-000000?style=for-the-badge\u0026logo=rust\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nUrx is a command-line tool designed for collecting URLs from OSINT archives, such as the Wayback Machine and Common Crawl. Built with Rust for efficiency, it leverages asynchronous processing to rapidly query multiple data sources. This tool simplifies the process of gathering URL information for a specified domain, providing a comprehensive dataset that can be used for various purposes, including security testing and analysis.\n\n## Features\n\n* Fetch URLs from multiple sources in parallel (Wayback Machine, Common Crawl, OTX)\n* Filter results by file extensions, patterns, or predefined presets (e.g., \"no-image\" to exclude images)\n* Support for multiple output formats: plain text, JSON, CSV\n* Output results to the console or a file, or stream via stdin for pipeline integration\n* URL Testing:\n  * Filter and validate URLs based on HTTP status codes and patterns.\n  * Extract additional links from collected URLs\n\n![Preview](https://raw.githubusercontent.com/hahwul/urx/refs/heads/main/docs/images/preview.jpg)\n\n## Installation\n\n### From Cargo\n\n```bash\n# https://crates.io/crates/urx\ncargo install urx\n```\n\n### From Homebrew\n\n```bash\n# https://formulae.brew.sh/formula/urx\nbrew install urx\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/hahwul/urx.git\ncd urx\ncargo build --release\n```\n\nThe compiled binary will be available at `target/release/urx`.\n\n### From Docker\n\n[ghcr.io/hahwul/urx](https://github.com/hahwul/urx/pkgs/container/urx)\n\n## Usage\n\n### Basic Usage\n\n```bash\n# Scan a single domain\nurx example.com\n\n# Scan multiple domains\nurx example.com example.org\n\n# Scan domains from a file\ncat domains.txt | urx\n```\n\n### Options\n\n```\nUsage: urx [OPTIONS] [DOMAINS]...\n\nArguments:\n  [DOMAINS]...  Domains to fetch URLs for\n\nOptions:\n  -c, --config \u003cCONFIG\u003e  Config file to load\n  -h, --help             Print help\n  -V, --version          Print version\n\nOutput Options:\n  -o, --output \u003cOUTPUT\u003e  Output file to write results\n  -f, --format \u003cFORMAT\u003e  Output format (e.g., \"plain\", \"json\", \"csv\") [default: plain]\n      --merge-endpoint   Merge endpoints with the same path and merge URL parameters\n\nProvider Options:\n  --providers \u003cPROVIDERS\u003e              Providers to use (comma-separated, e.g., \"wayback,cc,otx,vt,urlscan\") [default: wayback,cc,otx]\n  --subs                               Include subdomains when searching\n  --cc-index \u003cCC_INDEX\u003e                Common Crawl index to use (e.g., CC-MAIN-2025-13) [default: CC-MAIN-2025-13]\n  --vt-api-key \u003cVT_API_KEY\u003e            API key for VirusTotal (can also use URX_VT_API_KEY environment variable)\n  --urlscan-api-key \u003cURLSCAN_API_KEY\u003e  API key for Urlscan (can also use URX_URLSCAN_API_KEY environment variable)\n\nDiscovery Options:\n  --exclude-robots   Exclude robots.txt discovery\n  --exclude-sitemap  Exclude sitemap.xml discovery\n\nDisplay Options:\n  -v, --verbose      Show verbose output\n      --silent       Silent mode (no output)\n      --no-progress  No progress bar\n\nFilter Options:\n  -p, --preset \u003cPRESET\u003e\n          Filter Presets (e.g., \"no-resources,no-images,only-js,only-style\")\n  -e, --extensions \u003cEXTENSIONS\u003e\n          Filter URLs to only include those with specific extensions (comma-separated, e.g., \"js,php,aspx\")\n      --exclude-extensions \u003cEXCLUDE_EXTENSIONS\u003e\n          Filter URLs to exclude those with specific extensions (comma-separated, e.g., \"html,txt\")\n      --patterns \u003cPATTERNS\u003e\n          Filter URLs to only include those containing specific patterns (comma-separated)\n      --exclude-patterns \u003cEXCLUDE_PATTERNS\u003e\n          Filter URLs to exclude those containing specific patterns (comma-separated)\n      --show-only-host\n          Only show the host part of the URLs\n      --show-only-path\n          Only show the path part of the URLs\n      --show-only-param\n          Only show the parameters part of the URLs\n      --min-length \u003cMIN_LENGTH\u003e\n          Minimum URL length to include\n      --max-length \u003cMAX_LENGTH\u003e\n          Maximum URL length to include\n      --strict\n          Enforce exact host validation (default)\n\nNetwork Options:\n  --network-scope \u003cNETWORK_SCOPE\u003e  Control which components network settings apply to (all, providers, testers, or providers,testers) [default: all]\n  --proxy \u003cPROXY\u003e                  Use proxy for HTTP requests (format: http://proxy.example.com:8080)\n  --proxy-auth \u003cPROXY_AUTH\u003e        Proxy authentication credentials (format: username:password)\n  --insecure                       Skip SSL certificate verification (accept self-signed certs)\n  --random-agent                   Use a random User-Agent for HTTP requests\n  --timeout \u003cTIMEOUT\u003e              Request timeout in seconds [default: 30]\n  --retries \u003cRETRIES\u003e              Number of retries for failed requests [default: 3]\n  --parallel \u003cPARALLEL\u003e            Maximum number of parallel requests per provider and maximum concurrent domain processing [default: 5]\n  --rate-limit \u003cRATE_LIMIT\u003e        Rate limit (requests per second)\n\nTesting Options:\n  --check-status                     Check HTTP status code of collected URLs [aliases: --cs]\n  --include-status \u003cINCLUDE_STATUS\u003e  Include URLs with specific HTTP status codes or patterns (e.g., --is=200,30x) [aliases: --is]\n  --exclude-status \u003cEXCLUDE_STATUS\u003e  Exclude URLs with specific HTTP status codes or patterns (e.g., --es=404,50x,5xx) [aliases: --es]\n  --extract-links                    Extract additional links from collected URLs (requires HTTP requests)\n```\n\n### Examples\n\n```bash\n# Save results to a file\nurx example.com -o results.txt\n\n# Output in JSON format\nurx example.com -f json -o results.json\n\n# Filter for JavaScript files only\nurx example.com -e js\n\n# Exclude HTML and text files\nurx example.com --exclude-extensions html,txt\n\n# Filter for API endpoints\nurx example.com --patterns api,v1,graphql\n\n# Exclude specific patterns\nurx example.com --exclude-patterns static,images\n\n# Use Fileter Preset (similar to --exclude-extensions=png,jpg,.....)\nurx example.com -p no-images\n\n# Use specific providers\nurx example.com --providers wayback,otx\n\n# Using VirusTotal and URLScan providers\n# 1. Explicitly add to providers (with API keys via command line)\nurx example.com --providers=vt,urlscan --vt-api-key=*** --urlscan-api-key=***\n\n# 2. Using environment variables for API keys\nURX_VT_API_KEY=*** URX_URLSCAN_API_KEY=*** urx example.com --providers=vt,urlscan\n\n# 3. Auto-enabling: providers are automatically added when API keys are provided\nurx example.com --vt-api-key=*** --urlscan-api-key=*** # No need to specify in --providers\n\n# URLs from robots.txt and sitemap.xml are included by default\n\n# Exclude URLs from robots.txt files\nurx example.com --exclude-robots\n\n# Exclude URLs from sitemap\nurx example.com --exclude-sitemap\n\n# Include subdomains\nurx example.com --subs\n\n# Check status of collected URLs\nurx example.com --check-status\n\n# Extract additional links from collected URLs\nurx example.com --extract-links\n\n# Network configuration\nurx example.com --proxy http://localhost:8080 --timeout 60 --parallel 10 --insecure\n\n# Advanced filtering\nurx example.com -e js,php --patterns admin,login --exclude-patterns logout,static --min-length 20\n\n# HTTP Status code based filtering\nurx example.com --include-status 200,30x,405 --exclude-status 20x\n\n# Disable host validation\nurx example.com --strict false\n```\n\n## Integration with Other Tools\n\nUrx works well in pipelines with other security and reconnaissance tools:\n\n```bash\n# Find domains, then discover URLs\necho \"example.com\" | urx | grep \"login\" \u003e potential_targets.txt\n\n# Combine with other tools\ncat domains.txt | urx --patterns api | other-tool\n```\n\n## Inspiration\n\nUrx was inspired by [gau (GetAllUrls)](https://github.com/lc/gau), a tool that fetches known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl. While sharing similar core functionality, Urx was built from the ground up in Rust with a focus on performance, concurrency, and expanded filtering capabilities.\n\n## Contribute\n\nUrx is open-source project and made it with ❤️\nif you want contribute this project, please see [CONTRIBUTING.md](./CONTRIBUTING.md) and Pull-Request with cool your contents.\n\n[![](https://raw.githubusercontent.com/hahwul/urx/refs/heads/main/CONTRIBUTORS.svg)](https://github.com/hahwul/urx/graphs/contributors)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhahwul%2Furx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhahwul%2Furx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhahwul%2Furx/lists"}