Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/IAmStoxe/urlgrab
A golang utility to spider through a website searching for additional links.
https://github.com/IAmStoxe/urlgrab
spider
Last synced: about 1 month ago
JSON representation
A golang utility to spider through a website searching for additional links.
- Host: GitHub
- URL: https://github.com/IAmStoxe/urlgrab
- Owner: IAmStoxe
- Created: 2020-07-02T22:26:29.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-07T12:19:25.000Z (about 4 years ago)
- Last Synced: 2024-10-27T22:07:21.225Z (about 2 months ago)
- Topics: spider
- Language: Go
- Homepage:
- Size: 96.7 KB
- Stars: 329
- Watchers: 10
- Forks: 60
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-bugbounty-tools - urlgrab - A golang utility to spider through a website searching for additional links. (Recon / Links)
- awesome-rainmana - IAmStoxe/urlgrab - A golang utility to spider through a website searching for additional links. (Go)
- WebHackersWeapons - urlgrab
README
Welcome to urlgrab 👋
> A golang utility to spider through a website searching for additional links with support for JavaScript rendering.
## Install
```sh
go get -u github.com/iamstoxe/urlgrab
```## Features
* Customizable Parallelism
* Ability to Render JavaScript (including Single Page Applications such as Angular and React)## Usage
```bash
Usage of urlgrab:
-cache-dir string
Specify a directory to utilize caching. Works between sessions as well.
-debug
Extremely verbose debugging output. Useful mainly for development.
-delay int
Milliseconds to randomly apply as a delay between requests. (default 2000)
-depth int
The maximum limit on the recursion depth of visited URLs. (default 2)
-headless
If true the browser will be displayed while crawling.
Note: Requires render-js flag
Note: Usage to show browser: --headless=false (default true)
-ignore-query
Strip the query portion of the URL before determining if we've visited it yet.
-ignore-ssl
Scrape pages with invalid SSL certificates
-js-timeout int
The amount of seconds before a request to render javascript should timeout. (default 10)
-json string
The filename where we should store the output JSON file.
-max-body int
The limit of the retrieved response body in kilobytes.
0 means unlimited.
Supply this value in kilobytes. (i.e. 10 * 1024kb = 10MB) (default 10240)
-no-head
Do not send HEAD requests prior to GET for pre-validation.
-output-all string
The directory where we should store the output files.
-proxy string
The SOCKS5 proxy to utilize (format: socks5://127.0.0.1:8080 OR http://127.0.0.1:8080).
Supply multiple proxies by separating them with a comma.
-random-agent
Utilize a random user agent string.
-render-js
Determines if we utilize a headless chrome instance to render javascript.
-root-domain string
The root domain we should match links against.
If not specified it will default to the host of --url.
Example: --root-domain google.com
-threads int
The number of threads to utilize. (default 5)
-timeout int
The amount of seconds before a request should timeout. (default 10)
-url string
The URL where we should start crawling.
-urls string
A file path that contains a list of urls to supply as starting urls.
Requires --root-domain flag.
-user-agent string
A user agent such as (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0).
-verbose
Verbose output```
## Build
You can easily build a binary specific to your platform into the `bin` directory with th following command:
```
make build
```if you want to make binaries for Windows, Linux and MacOS to distribute the CLI, just run this command:
```
make cross
```
All the binaries will be available in the `dist` directory.## Author
👤 **Devin Stokes**
* Twitter: [@DevinStokes](https://twitter.com/DevinStokes)
* Github: [@IAmStoxe](https://github.com/IAmStoxe)## 🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://github.com/IAmStoxe/urlgrab/issue).## Show your support
Give a ⭐ if this project helped you!