Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/root4loot/recrawl
A Web URL crawler written in Go
https://github.com/root4loot/recrawl
bugbounty crawler discovery enumeration go golang recon reconnaissance web
Last synced: 2 months ago
JSON representation
A Web URL crawler written in Go
- Host: GitHub
- URL: https://github.com/root4loot/recrawl
- Owner: root4loot
- License: mit
- Created: 2023-04-05T18:07:00.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-04-11T12:57:43.000Z (9 months ago)
- Last Synced: 2024-04-11T14:25:03.473Z (9 months ago)
- Topics: bugbounty, crawler, discovery, enumeration, go, golang, recon, reconnaissance, web
- Language: Go
- Homepage:
- Size: 4.69 MB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
recrawl: A Web URL crawler written in Go
Warning: This project is under active development. Bugs are to be expected.## Installation
### Go
```
go install github.com/root4loot/recrawl@master
```### Docker
```
git clone https://github.com/root4loot/recrawl.git && cd recrawl
docker build -t recrawl .
docker run -it recrawl -h
```## Usage
```sh
Usage: ./recrawl [options] (-t | -i )TARGETING:
-i, --infile file containing targets (one per line)
-t, --target target domain/url (comma-separated)
-ih, --include-host also crawls this host (if found) (comma-separated)
-eh, --exclude-host do not crawl this host (if found) (comma-separated)CONFIGURATIONS:
-c, --concurrency number of concurrent requests (Default: 20)
-to, --timeout max request timeout (Default: 10 seconds)
-d, --delay delay between requests (Default: 0 milliseconds)
-dj, --delay-jitter max jitter between requests (Default: 0 milliseconds)
-ua, --user-agent set user agent (Default: Mozilla/5.0)
-fr, --follow-redirects follow redirects (Default: true)
-p, --proxy set proxy (Default: none)
-r, --resolvers file containing list of resolvers (Default: System DNS)
-H, --header set custom header (Default: none)OUTPUT:
-fs, --filter-status filter by status code (comma-separated)
-fe, --filter-ext filter by extension (comma-separated)
-v, --verbose verbose output (use -vv for added verbosity)
-o, --outfile output results to given file
-hs, --hide-status hide status code from output
-hw, --hide-warning hide warnings from output
-hm, --hide-media hide media from output (images, fonts, etc.)
-s, --silence silence results from output
-h, --help display help
--version display version
```## Example
```sh
# Crawl *.example.com
➜ recrawl -t example.com
➜ recrawl -t example.com# Crawl *.example.com and IP address
➜ recrawl -t example.com,103.196.38.38# Crawl all hosts in given file
➜ recrawl -i targets.txt# Crawl *.example.com and also include *.example2.com if found
➜ recrawl -t example.com -ih example2.com# Crawl all domains in target that contain the word example
➜ recrawl -t example.com -ih example# Crawl *.example.com but avoid foo.example.com
➜ recrawl -t example.com -eh foo.example.com
```### Example running
Running recrawl against hackerone.com to filter JavaScript files:
```sh
➜ recrawl -t hackerone.com --filter-ext js
```Other ways to set target
Pipe the target URL
```sh
➜ echo hackerone.com | recrawl
```Pipe a file containing targets
```sh
➜ echo targets.txt | recrawl
```Use the -i option to provide a file with targets
```sh
➜ recrawl -i targets.txt
```This will crawl hackerone.com and filter JavaScript files. Here's a sample output:
```sh
[recrawl] (INF) Included extensions: js
[recrawl] (INF) Concurrency: 20
[recrawl] (INF) Timeout: 10 seconds
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_EOrKavGmjAkpIaCW_cpGJ240OpVZev_5NI-WGIx5URg.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_5JbqBIuSpSQJk1bRx1jnlE-pARPyPPF5H07tKLzNC80.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_a7_tjanmGpd_aITZ38ofV8QT2o2axkGnWqPwKna1Wf0.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_xF9mKu6OVNysPMy7w3zYTWNPFBDlury_lEKDCfRuuHs.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_coYiv6lRieZN3l0IkRYgmvrMASvFk2BL-jdq5yjFbGs.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_Z1eePR_Hbt8TCXBt3JlFoTBdW2k9-IFI3f96O21Dwdw.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_LEbRIvnUToqIQrjG9YpPgaIHK6o77rKVGouOaWLGI5k.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_ol7H2KkxPxe7E03XeuZQO5qMcg0RpfSOgrm_Kg94rOs.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_p5BLPpvjnAGGBCPUsc4EmBUw9IUJ0jMj-QY_1ZpOKG4.js
[recrawl] (RES) 200 https://www.hackerone.com/sites/default/files/js/js_V5P0-9GKw8QQe-7oWrMD44IbDva6o8GE-cZS7inJr-g.js
...
```Results can be piped to stdout:
```sh
➜ recrawl -t hackerone.com --hide-status --filter-ext js | cat
```Or saved to specified file:
```sh
➜ recrawl -t hackerone.com --hide-status --filter-ext js -o results.txt
```## As lib
```sh
go get -u github.com/root4loot/recrawl
``````go
package mainimport (
"fmt""github.com/root4loot/recrawl/pkg/options"
"github.com/root4loot/recrawl/pkg/runner"
)func main() {
options := options.Options{
Include: []string{"example.com"},
Exclude: []string{"support.hackerone.com"},
Concurrency: 2,
Timeout: 10,
Delay: 0,
DelayJitter: 0,
Resolvers: []string{"8.8.8.8", "208.67.222.222"},
UserAgent: "recrawl",
}runner := runner.NewRunnerWithOptions(&options)
// create a separate goroutine to process the results as they come in
go func() {
for result := range runner.Results {
fmt.Println(result.StatusCode, result.RequestURL, result.Error)
}
}()// single target
runner.Run("google.com")// multiple targets
targets := []string{"hackerone.com", "bugcrowd.com"}
runner.Run(targets...)
}```
## Todo
- Clean up worker
- Headless browsing
- Output and filter by MIME
- Option to perform dirbusting / custom wordlist
- Respect robots.txt option---
## Contributing
Contributions are very welcome. See [CONTRIBUTING.md](CONTRIBUTING.md)