Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/omarch7/OCrawler
A Go Crawler to retrieve site maps from domains.
https://github.com/omarch7/OCrawler
Last synced: 4 months ago
JSON representation
A Go Crawler to retrieve site maps from domains.
- Host: GitHub
- URL: https://github.com/omarch7/OCrawler
- Owner: omarch7
- Created: 2017-09-25T07:23:50.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-10-14T05:11:19.000Z (over 7 years ago)
- Last Synced: 2024-07-09T01:18:04.777Z (8 months ago)
- Language: Go
- Size: 382 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OCrawler
---
A small Crawler Written in Go to retrieve the Site Map of any domain
Why O? Just because my name starts with O, nothing special 😊.
### Prerequisites
* Go 1.8^
* github.com/beego/bee/logger/colors
...Adds colour to the output of the program
* golang.org/x/net/html
...Parses HTML documents### Installation
To download and install the libraries just type on the CLI
```
$ go get github.com/beego/bee/logger/colors
$ go get golang.org/x/net/html
```Don't forget to setup your GOPATH environment variable!
### Build
To build the source code go to the working directory and just type
```
$ go build
```This will generate the executable file
### Run
To run simply execute the file that was generated after the build in this case OCrawler
The first variable is the domain to crawl, without http/https at the beginning and without any URI, even without a single slash /
The second variable is the maximum depth on when crawling, this will set a threshold on how many levels more want to crawl, a good number should be 1 or 2.
The third variable is how many processes you want to have alive at the same time while the execution is in progress.
A good rule of thumb is to use the number of threads your computer has.```
$ ./OCrawler [domain] [depth] [max processes]
$ ./OCrawler golang.org 2 8
```### Output
The tree has 3 main colours
* Blue: new discovered links
* Magenta: Already discovered links
* Green: Assets---
Developed by Omar Contreras [[email protected]](mailto:[email protected])