https://github.com/wcong/ants-go
open source, distributed, restful crawler engine in golang
https://github.com/wcong/ants-go
Last synced: 8 months ago
JSON representation
open source, distributed, restful crawler engine in golang
- Host: GitHub
- URL: https://github.com/wcong/ants-go
- Owner: wcong
- License: mit
- Archived: true
- Created: 2015-02-02T07:21:03.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2016-04-05T05:03:38.000Z (over 9 years ago)
- Last Synced: 2024-10-29T17:51:03.313Z (about 1 year ago)
- Language: Go
- Homepage:
- Size: 143 KB
- Stars: 363
- Watchers: 45
- Forks: 118
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-crawler-cn - ants-go - 基于Golang的高并行网络爬虫. (Go)
- awesome-crawler - ants-go - A open source, distributed, restful crawler engine in golang. (Go)
README
### ants-go
open source, restful, distributed crawler engine
### gitter
[](https://gitter.im/wcong/ants-go?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
### comming up
* Persistence
* Dynamic Master
### design of ants-go
##### ants
I wrote a crawler engine named [ants](https://github.com/wcong/ants) in python base on [scrapy](https://github.com/scrapy/scrapy). But sometimes, dynamic language is chaos.
So I start to write it in a compile language.
##### scrapy
I design the crawler framework by imitating [scrapy](https://github.com/scrapy/scrapy).
such as downloader,scraper,and the way user write customize spider,
but in a compile way
##### elasticsearch
I design my distributed architecture by imitating [elasticsearch](https://github.com/elasticsearch/elasticsearch).
it spire me to do a engine for distributed crawler
### requirement
``` shell
go get github.com/PuerkitoBio/goquery
go get github.com/go-sql-driver/mysql
```
### install
``` shell
go get github.com/wcong/ants-go
go install github.com/wcong/ants-go
```
### run
``` shell
cd bin
./ants-go
```
#### check cluster status
```
curl 'http://localhost:8200/cluster'
```
#### get all spiders
```
curl 'http://localhost:8200/spiders'
```
#### start a spider
```
curl 'http://localhost:8200/crawl?spider=spiderName'
```
#### cluster in one computer
to test cluster in one computer,you can run it from different port in different terminal
one node,use the default port tcp 8300 http 8200
``` shell
cd bin
./ants-go
```
the other node set tcp port and http port
``` shell
cd bin
./ants-go -tcp 9300 -http 9200
```
#### flags
there are some flags you can set,check out the help message
``` shell
./ants-go -h
./ants-go -help
```
### Customize spider
1. go to *spiders*
2. write your spiders follow the example *deap_loop_spider.go* or go to the [spider page](./SPIDER.md)
3. add you spider to spiderMap,follow the example in *LoadAllSpiders* in *load_all_spider.go*
4. install again