An open API service indexing awesome lists of open source software.

https://github.com/saeeddhqan/evine

Interactive CLI Web Crawler
https://github.com/saeeddhqan/evine

cli crawler data-mining fuzzing go golang osint scraper web-crawler

Last synced: 5 months ago
JSON representation

Interactive CLI Web Crawler

Awesome Lists containing this project

README

          

[![Go Report Card](https://goreportcard.com/badge/github.com/saeeddhqan/evine)](https://goreportcard.com/report/github.com/saeeddhqan/evine)
[![License](https://img.shields.io/github/license/saeeddhqan/evine?color=%234ac41c)](https://opensource.org/licenses/GPL-3.0)
[![Build Status](https://travis-ci.com/saeeddhqan/evine.svg?branch=master)](https://travis-ci.com/saeeddhqan/evine)
# Evine

Interactive CLI Web Crawler.

Evine is a simple, fast, and interactive web crawler and web scraper written in Golang.
Evine is useful for a wide range of purposes such as metadata and data extraction, data mining, reconnaissance and testing.

[![asciicast](https://asciinema.org/a/351624.svg)](https://asciinema.org/a/351624)

If you like the project, give it a star. It forces me to develop the project!

## Install

### From Binary
Pre-build [binary releases](https://github.com/saeeddhqan/evine/releases) are also available(Suggested).
### From source
```
go get github.com/saeeddhqan/evine
"$GOPATH/bin/evine" -h
```
### From GitHub
```
git clone https://github.com/saeeddhqan/evine.git
cd evine
go build .
mv evine /usr/local/bin
evine --help
```

Note: golang 1.13.x required.

## Commands & Usage

Keybinding | Description
----------------------------------------|---------------------------------------
Enter | Run crawler (from URL view)
Enter | Display response (from Keys and Regex views)
Tab | Next view
Ctrl+Space | Run crawler
Ctrl+S | Save response
Ctrl+Z | Quit
Ctrl+R | Restore to default values (from Options and Headers views)
Ctrl+Q | Close response save view (from Save view)

```bash
evine -h
```
It will display help for the tool:

| flag | Description | Example |
|------|-------------|---------|
| -url | URL to crawl for | evine -url toscrape.com |
| -url-exclude string | Exclude URLs maching with this regex (default ".*") | evine -url-exclude ?id= |
| -domain-exclude string | Exclude in-scope domains to crawl. Separate with comma. default=root domain | evine -domain-exclude host1.tld,host2.tld |
| -code-exclude string | Exclude HTTP status code with these codes. Separate whit '\|' (default ".*") | evine -code-exclude 200,201 |
| -delay int | Sleep between each request(Millisecond) | evine -delay 300 |
| -depth | Scraper depth search level (default 1) | evine -depth 2 |
| -thread int | The number of concurrent goroutines for resolving (default 5) | evine -thread 10 |
| -header | HTTP Header for each request(It should to separated fields by \n). | evine -header KEY: VALUE\nKEY1: VALUE1 |
| -proxy string | Proxy by scheme://ip:port | evine -proxy http://1.1.1.1:8080 |
| -scheme string | Set the scheme for the requests (default "https") | evine -scheme http |
| -timeout int | Seconds to wait before timing out (default 10) | evine -timeout 15 |
| -query string | JQuery expression(It could be a file extension(pdf), a key query(url,script,css,..) or a jquery selector($("a[class='hdr']).attr('hdr')"))) | evine -query url,pdf,txt |
| -regex string | Search the Regular Expression on the page contents | evine -regex 'User.+' |
| -logger string | Log errors in a file | evine -logger log.txt |
| -max-regex int | Max result of regex search for regex field (default 1000) | evine -max-regex -1 |
| -robots | Scrape robots.txt for URLs and using them as seeds | evine -robots |
| -sitemap | Scrape sitemap.xml for URLs and using them as seeds | evine -sitemap |
| -wayback | Scrape WayBackURLs(web.archive.org) for URLs and using them as seeds | evine -sitemap |

### VIEWS
- URL, In this view, you should enter the URL string.
- Options, This view is for setting options.
- Headers, This view is for setting the HTTP Headers.
- Query, This view is used after the crawling web.
It will be used to extract the data(docs, URLs, etc) from the web pages that have been crawled.
- Regex, This view is useful to search the Regexes in web pages that have been crawled. Write your Regex in this view and press Enter.
- Response, All of the results writes in this view.
- Search, This view is used to search the Regexes in the Response view content.

### Extract methods
#### From Keys
Keys are predefined keywords that can be used to specify data like in scope URLs, out scope URLs, emails, etc.
List of all keys:
- url, to extract IN SCOPE urls. the urls completly are sanitized.
- email, to extract IN SCOPE and out scope emails.
- query_urls, to extract IN SCOPE urls that contains the get query: ?foo=bar.
- all_urls, to extract OUT SCOPE urls.
- phone, to extract a[href]s that contains a phone number.
- media, to extract files that are not web executable file. like .exe,.bat,.tar.xz,.zip, etc addresses.
- css, to extract CSS files.
- script, to extract JavaScript files.
- cdn, to extract Content Delivery Networks(CDNs) addresses. like //api.foo.bar/jquery.min.js
- comment, to extract html comments, <\!-- .* !-->
- dns, to extract subdomains that belongs to the website.
- network, to extract social network IDs. like facebook, twitter, etc.
- all, to extract all list of keys.(url,query_url,..)
keys are case-sensitive. Also, it could be written to or three key with comma separation.
#### From Extensions
Maybe you wanna a file that is not defined in keys. What can you do? You can easily write the extension of the file on the Query view. like png,xml,txt,docx,xlsx,a,mp3, etc.
#### From JQuery selector
If you have basic JQuery skills, you can easily use this feature, but if not, it is not very difficult. To have a quick view about the selectors [w3schools](https://www.w3schools.com/jquery/jquery_ref_selectors.asp) is a great source.

example(To find source[src]):
```javascript
$("source").attr("src") // To find all of source[src] urls
$("h1").text() // To find h1 values
```
Template:
```javascript
$("SELECTOR").METHOD_NAME("arg")
```
It does not support queries like below:
```javascript
$('SELECTOR').METHOD("arg")
$('SELECTOR').METHOD('arg')
$("SELECTOR" ).METHOD("arg" )
```
Methods are described below:
- text(), to returns the content of the SELECTOR without html tag.
- html(), to returns the content of the SELECTOR with html tag.
- attr("ATTR"), to get the attribute of the SELECTOR. e.g $("a").attr("href")

## Bugs or Suggestions

To report bugs or suggestions, create an [issue](https://github.com/saeeddhqan/evine/issues).

Evine is heavily inspired by [wuzz](https://github.com/asciimoo/wuzz).