{"id":28954650,"url":"https://github.com/print3m/pathfinder","last_synced_at":"2025-06-23T19:30:22.161Z","repository":{"id":300283793,"uuid":"865450425","full_name":"Print3M/pathfinder","owner":"Print3M","description":"The ultimate crawler designed for lightning-fast recursive URL scraping.","archived":false,"fork":false,"pushed_at":"2024-10-05T18:45:45.000Z","size":594,"stargazers_count":6,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-20T21:19:48.281Z","etag":null,"topics":["bugbounty-tool","crawler","crawlergo","go","golang","information-gathering","infosec","osint","osint-reconnaissance","path-extractor","pathfinder","pentesting","scraper","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Print3M.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-30T14:50:49.000Z","updated_at":"2025-01-08T13:48:34.000Z","dependencies_parsed_at":"2025-06-20T21:29:54.260Z","dependency_job_id":null,"html_url":"https://github.com/Print3M/pathfinder","commit_stats":null,"previous_names":["print3m/pathfinder"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/Print3M/pathfinder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Print3M%2Fpathfinder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Print3M%2Fpathfinder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Print3M%2Fpathfinder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Print3M%2Fpathfinder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Print3M","download_url":"https://codeload.github.com/Print3M/pathfinder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Print3M%2Fpathfinder/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261542517,"owners_count":23174594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bugbounty-tool","crawler","crawlergo","go","golang","information-gathering","infosec","osint","osint-reconnaissance","path-extractor","pathfinder","pentesting","scraper","webscraping"],"created_at":"2025-06-23T19:30:19.547Z","updated_at":"2025-06-23T19:30:22.139Z","avatar_url":"https://github.com/Print3M.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PathFinder 🕵🏻‍♂️🌐\n\nPathFinder – the ultimate web crawler script designed for lightning-fast, concurrent, and recursive URL scraping. Cutting-edge multithreading architecture ensures rapid URL extraction while guaranteeing that each page is visited only once. This tool extracts URLs from various HTML tags, including `a`, `form`, `iframe`, `img`, `embed`, and more. External URLs, relative paths and subdomains are supported as well.\n\n![Example PathFinder usage](_img/img-01.jpg)\n\n**Usage**: It is a great tool for discovering new web paths and subdomains, creating a site map, gathering OSINT information. It might be very useful for bug hunters and pentesters. 🔥👾🔥\n\n## Installation\n\nThere are 2 options:\n\n1. Download latest binary from [GitHub releases](https://github.com/Print3M/pathfinder/releases).\n2. Build manually:\n\n```bash\n# Download and build the source code\ngit clone https://github.com/Print3M/pathfinder\ncd pathfinder/\ngo build\n\n# Run\n./pathfinder --help\n```\n\n## How to use it?\n\nTL;DR;\n\n```bash\n# Run URLs extraction\npathfinder -u http://example.com --threads 25\n\n# Show help\npathfinder -h\n```\n\n### Initial URL\n\n`-u \u003curl\u003e`, `--url \u003curl` [required]\n\nUse this parameter to set the start page for the script. By default, the script extracts all URLs that refer to the domain provided in this parameter and its subdomains. External URLs are scraped but not visited by the script.\n\n### Threads\n\n`-t \u003cnum\u003e`, `--threads \u003cnum\u003e` [default: 10]\n\nUse this parameter to set the number of threads that will extract data concurrently.\n\n## Rate limiting\n\n`-r \u003creqs_per_sec\u003e`, `--rate \u003creqs_per_sec\u003e` [default: none]\n\nUse this parameter to specify max number of requests per second. It's a total number of requests per second - number of threads doesn't matter. By default, requests are sent as fast as possible! 🚄💨\n\n### Output file\n\n`-o \u003cfile\u003e`, `--output \u003cfile\u003e` [default: none]\n\nUse this parameter to specify output file where URLs will be saved. However, they will be still printed out on the screen if you do not use quiet mode (`-q` or `--quite`).\n\nOutput is saved to a file on the fly, so even if you stop executing the script the downloaded URLs will be saved.  \n\n### Add HTTP header\n\n`-H \u003cheader\u003e`, `--header \u003cheader\u003e` [default: none]\n\nUse this parameter to specify custom HTTP headers. They are used with every scraping request. One `-H` parameter must contain only one HTTP header but you can use it multiple times. Headers\n\nExample:\n\n`./pathfinder ... -H \"Authorization: test\" -H \"Cookies: cookie1=choco; cookie2=yummy\"`\n\n### Quiet mode\n\n`-q`, `--quiet` [default: false]\n\nUse this parameter to disable printing out scraped URLs on the screen.\n\n### Disable recursive scraping\n\n`--no-recursion` [default: false]\n\nUse this parameter to disable recursive scraping. No other page will be visited except the one you provided using `-u \u003curl\u003e` parameter. Only one page will be visited. It actually disables what's coolest about PathFinder.\n\n### Disable subdomains scraping\n\n`--no-subdomains` [default: false]\n\nUse this parameter to disable scraping of subdomains of the URL provided using `-u \u003curl\u003e` parameter.\n\nExample (`-u http://test.example.com`):\n\n- `http://test.example.com/index.php` - ✅ scraped\n- `http://api.test.example.com/index.php` - ✅ scraped\n- `http://example.com/index.php` - ❌ not scraped.\n\nExample (`-u http://test.example.com --no-subdomains`):\n\n- `http://test.example.com/index.php` - ✅ scraped.\n- `http://api.test.example.com/index.php` - ❌ not scraped.\n- `http://example.com/index.php` - ❌ not scraped.\n\n### Disable externals scraping\n\n`--no-externals` [default: false]\n\nExternal URLs are not visited anyway, but using this parameter you filter out all external URLs from the output.\n\n### Enable scraping of static assets\n\n`--with-assets` [default: false]\n\nUse this parameter to enable scraping URLs of static assets like CSS, JavaScript, images, fonts and so on. This is disabled by default because it usually generates too much noise.\n\n### User agent\n\nUser agent is randomly changed on each request from a set of predefined strings.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprint3m%2Fpathfinder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprint3m%2Fpathfinder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprint3m%2Fpathfinder/lists"}