https://github.com/cyclone-github/spider
URL Spider - web crawler and wordlist / ngram generator
https://github.com/cyclone-github/spider
cewl crawler cyclone generator gramify n-gram ngram ngram-generator scaping scraper spider url url-crawler url-spider web web-crawler web-scraping wordlist wordlist-generator
Last synced: 6 months ago
JSON representation
URL Spider - web crawler and wordlist / ngram generator
- Host: GitHub
- URL: https://github.com/cyclone-github/spider
- Owner: cyclone-github
- License: gpl-2.0
- Created: 2023-04-28T21:51:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-19T15:20:20.000Z (7 months ago)
- Last Synced: 2025-03-22T13:22:31.554Z (7 months ago)
- Topics: cewl, crawler, cyclone, generator, gramify, n-gram, ngram, ngram-generator, scaping, scraper, spider, url, url-crawler, url-spider, web, web-crawler, web-scraping, wordlist, wordlist-generator
- Language: Go
- Homepage:
- Size: 58.6 KB
- Stars: 14
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/cyclone-github/spider/)
[](https://goreportcard.com/report/github.com/cyclone-github/spider)
[](https://github.com/cyclone-github/spider/issues)
[](LICENSE)
[](https://github.com/cyclone-github/spider/releases)
[](https://pkg.go.dev/github.com/cyclone-github/spider)# Cyclone's URL Spider
```
----------------------
| Cyclone's URL Spider |
----------------------Crawling URL: https://forum.hashpwn.net
Base domain: forum.hashpwn.net
Crawl depth: 2
ngram len: 1-3
Crawl delay: 0ms (increase this to avoid rate limiting, ex: -delay 100)
URLs crawled: 51
Processing... [====================] 100.00%
Unique words: 1983
Unique ngrams: 11030
Writing... [====================] 100.00%
Output file: forum.hashpwn.net_wordlist.txt
RAM used: 0.03 GB
Runtime: 4.949s
```Wordlist & ngram creation tool to crawl a given url and create wordlists and/or ngrams (depending on flags given).
### Usage Instructions:
- To create a simple wordlist from a specified url (will save deduplicated wordlist to url_wordlist.txt):
- `./spider.bin -url https://github.com/cyclone-github`
- To set url crawl url depth of 2 and create ngrams len 1-5, use flag "-crawl 2" and "-ngram 1-5"
- `./spider.bin -url https://github.com/cyclone-github -crawl 2 -ngram 1-5`
- To set a custom output file, use flag "-o filename"
- `./spider.bin -url https://github.com/cyclone-github -o wordlist.txt`
- To set a delay to keep from being rate-limited, use flag "-delay nth" where nth is time in milliseconds
- `./spider.bin -url https://github.com/cyclone-github -delay 100`
- Run `./spider.bin -help` to see a list of all options### Compile from source:
- If you want the latest features, compiling from source is the best option since the release version may run several revisions behind the source code.
- This assumes you have Go and Git installed
- `git clone https://github.com/cyclone-github/spider.git` # clone repo
- `cd spider` # enter project directory
- `go mod init spider` # initialize Go module (skips if go.mod exists)
- `go mod tidy` # download dependencies
- `go build -ldflags="-s -w" .` # compile binary in current directory
- `go install -ldflags="-s -w" .` # compile binary and install to $GOPATH
- Compile from source code how-to:
- https://github.com/cyclone-github/scripts/blob/main/intro_to_go.txt
### Change Log:
- https://github.com/cyclone-github/spider/blob/main/CHANGELOG.md
### Mentions:
- Go Package Documentation: https://pkg.go.dev/github.com/cyclone-github/spider
- Softpedia: https://www.softpedia.com/get/Internet/Other-Internet-Related/Cyclone-s-URL-Spider.shtml### Antivirus False Positives:
- Several antivirus programs on VirusTotal incorrectly detect compiled Go binaries as a false positive. This issue primarily affects the Windows executable binary, but is not limited to it. If this concerns you, I recommend carefully reviewing the source code, then proceed to compile the binary yourself.
- Uploading your compiled binaries to https://virustotal.com and leaving an up-vote or a comment would be helpful as well.