https://github.com/print3m/pathfinder
The ultimate crawler designed for lightning-fast recursive URL scraping.
https://github.com/print3m/pathfinder
bugbounty-tool crawler crawlergo go golang information-gathering infosec osint osint-reconnaissance path-extractor pathfinder pentesting scraper webscraping
Last synced: 3 months ago
JSON representation
The ultimate crawler designed for lightning-fast recursive URL scraping.
- Host: GitHub
- URL: https://github.com/print3m/pathfinder
- Owner: Print3M
- Created: 2024-09-30T14:50:49.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-05T18:45:45.000Z (12 months ago)
- Last Synced: 2025-06-20T21:19:48.281Z (4 months ago)
- Topics: bugbounty-tool, crawler, crawlergo, go, golang, information-gathering, infosec, osint, osint-reconnaissance, path-extractor, pathfinder, pentesting, scraper, webscraping
- Language: Go
- Homepage:
- Size: 580 KB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PathFinder π΅π»ββοΈπ
PathFinder β the ultimate web crawler script designed for lightning-fast, concurrent, and recursive URL scraping. Cutting-edge multithreading architecture ensures rapid URL extraction while guaranteeing that each page is visited only once. This tool extracts URLs from various HTML tags, including `a`, `form`, `iframe`, `img`, `embed`, and more. External URLs, relative paths and subdomains are supported as well.

**Usage**: It is a great tool for discovering new web paths and subdomains, creating a site map, gathering OSINT information. It might be very useful for bug hunters and pentesters. π₯πΎπ₯
## Installation
There are 2 options:
1. Download latest binary from [GitHub releases](https://github.com/Print3M/pathfinder/releases).
2. Build manually:```bash
# Download and build the source code
git clone https://github.com/Print3M/pathfinder
cd pathfinder/
go build# Run
./pathfinder --help
```## How to use it?
TL;DR;
```bash
# Run URLs extraction
pathfinder -u http://example.com --threads 25# Show help
pathfinder -h
```### Initial URL
`-u `, `--url `, `--threads ` [default: 10]
Use this parameter to set the number of threads that will extract data concurrently.
## Rate limiting
`-r `, `--rate ` [default: none]
Use this parameter to specify max number of requests per second. It's a total number of requests per second - number of threads doesn't matter. By default, requests are sent as fast as possible! ππ¨
### Output file
`-o `, `--output ` [default: none]
Use this parameter to specify output file where URLs will be saved. However, they will be still printed out on the screen if you do not use quiet mode (`-q` or `--quite`).
Output is saved to a file on the fly, so even if you stop executing the script the downloaded URLs will be saved.
### Add HTTP header
`-H `, `--header ` [default: none]
Use this parameter to specify custom HTTP headers. They are used with every scraping request. One `-H` parameter must contain only one HTTP header but you can use it multiple times. Headers
Example:
`./pathfinder ... -H "Authorization: test" -H "Cookies: cookie1=choco; cookie2=yummy"`
### Quiet mode
`-q`, `--quiet` [default: false]
Use this parameter to disable printing out scraped URLs on the screen.
### Disable recursive scraping
`--no-recursion` [default: false]
Use this parameter to disable recursive scraping. No other page will be visited except the one you provided using `-u ` parameter. Only one page will be visited. It actually disables what's coolest about PathFinder.
### Disable subdomains scraping
`--no-subdomains` [default: false]
Use this parameter to disable scraping of subdomains of the URL provided using `-u ` parameter.
Example (`-u http://test.example.com`):
- `http://test.example.com/index.php` - β scraped
- `http://api.test.example.com/index.php` - β scraped
- `http://example.com/index.php` - β not scraped.Example (`-u http://test.example.com --no-subdomains`):
- `http://test.example.com/index.php` - β scraped.
- `http://api.test.example.com/index.php` - β not scraped.
- `http://example.com/index.php` - β not scraped.### Disable externals scraping
`--no-externals` [default: false]
External URLs are not visited anyway, but using this parameter you filter out all external URLs from the output.
### Enable scraping of static assets
`--with-assets` [default: false]
Use this parameter to enable scraping URLs of static assets like CSS, JavaScript, images, fonts and so on. This is disabled by default because it usually generates too much noise.
### User agent
User agent is randomly changed on each request from a set of predefined strings.