Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
https://github.com/infinilabs/crawler

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 6 days ago
JSON representation

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

Awesome Lists containing this project

README

        

What a Spider!

GOPA, A Spider Written in Go.

[![Travis](https://travis-ci.org/infinitbyte/gopa.svg?branch=master)](https://travis-ci.org/infinitbyte/gopa)
[![Go Report Card](https://goreportcard.com/badge/github.com/infinitbyte/gopa)](https://goreportcard.com/report/github.com/infinitbyte/gopa)
[![Join the chat at https://gitter.im/infinitbyte/gopa](https://badges.gitter.im/infinitbyte/gopa.svg)](https://gitter.im/infinitbyte/gopa?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

## Goal

* Light weight, low footprint, memory requirement should < 100MB
* Easy to deploy, no runtime or dependency required
* Easy to use, no programming or scripts ability needed, out of box features

## Screenshoot

What a Spider! GOPA Spider!

---

- [How to use](#how-to-use)
- [Requirements](#requirements)
- [Setup](#setup)
- [Download Pre Built Package](#download-pre-built-package)
- [Compile The Package Manually](#compile-the-package-manually)
- [Required Config](#required-config)
- [Start](#start)
- [Stop](#stop)
- [Configuration](#configuration)
- [UI](#ui)
- [API](#api)
- [Architecture](#architecture)
- [Contributing](#contributing)
- [License](#license)

## How to use

### Requirements

* Elasticsearch v5.3+

### Setup

First of all, get it, two opinions: download the pre-built package or compile it yourself.

#### Download Pre Built Package

Go to [Release](https://github.com/infinitbyte/gopa/releases) page, download the right package for your platform.

_Note: Darwin is for Mac_

#### Compile The Package Manually

Requirements
* Golang 1.9+

Supported platform
- Mac/Linux: Run `make build` to build the Gopa.

- Windows: Checkout this wiki page - [How to build GOPA on windows](https://github.com/infinitbyte/gopa/wiki/How-to-build-GOPA-on-windows).

For example:
```
#apt install golang-go
#brew install golang
mkdir ~/go/src/github.com/infinitbyte/ -p
cd ~/go/src/github.com/infinitbyte/
git clone https://github.com/infinitbyte/gopa.git
cd gopa
make
```

After a few minutes, you should have:

> `gopa`, the main program, a single binary.

> `gopa.yml`, main configuration for gopa.

### Required Config

_Note: Elasticsearch version should >= v5.3_

- Enable elastic module in `gopa.yml`, update the elasticsearch's setting:
```
elasticsearch:
- name: default
enabled: true
endpoint: http://localhost:9200
index_prefix: gopa-
basic_auth:
username: elastic
password: changeme

```

### Start

Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run `./gopa` to start the program.

Gopa can be run as daemon(_Note: Only available on Linux and Mac_):


Example


➜ gopa git:(master) ✗ ./bin/gopa --daemon
________ ________ __________ _____
/ _____/ \_____ \\______ \/ _ \
/ \ ___ / | \| ___/ /_\ \
\ \_\ \/ | \ | / | \
\______ /\_______ /____| \____|__ /
\/ \/ \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///

[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0
[gopa] started.


Also run `./gopa -h` to get the full list of command line options.


Example


➜ gopa git:(master) ✗ ./bin/gopa -h
________ ________ __________ _____
/ _____/ \_____ \\______ \/ _ \
/ \ ___ / | \| ___/ /_\ \
\ \_\ \/ | \ | / | \
\______ /\_______ /____| \____|__ /
\/ \/ \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///

Usage of ./bin/gopa:
-config string
the location of config file (default "gopa.yml")
-cpuprofile string
write cpu profile to this file
-daemon
run in background as daemon
-debug
run in debug mode, gopa will quit with panic error
-log string
the log level,options:trace,debug,info,warn,error (default "info")
-log_path string
the log path (default "log")
-memprofile string
write memory profile to this file
-pidfile string
pidfile path (only for daemon)
-pprof string
enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars


### Stop

It's safety to press `ctrl+c` stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later, the world is still in your hand.

If you are running `Gopa` as daemon, you may stop it like this:

```
kill -QUIT `pgrep gopa`
```

## Configuration

## UI

* Search Console `http://127.0.0.1:9000/`
* Admin Console `http://127.0.0.1:9000/admin/`

## API

## Architecture

What a Spider! GOPA Spider!

## Who uses it?

You use GOPA and you want to be listed there? [Contact me](https://medcl.com).

License
=======
Released under the [Apache License, Version 2.0](https://github.com/infinitbyte/gopa/blob/master/LICENSE) .