Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fmpwizard/owlcrawler
Crawl the web using nats.io and Go
https://github.com/fmpwizard/owlcrawler
Last synced: 20 days ago
JSON representation
Crawl the web using nats.io and Go
- Host: GitHub
- URL: https://github.com/fmpwizard/owlcrawler
- Owner: fmpwizard
- License: apache-2.0
- Created: 2015-02-18T10:48:08.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-10-01T04:57:11.000Z (about 9 years ago)
- Last Synced: 2024-11-15T10:42:37.840Z (about 1 month ago)
- Language: Go
- Homepage:
- Size: 393 KB
- Stars: 55
- Watchers: 9
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-mesos - OwlCrawler
README
# OwlCrawler
It's a distributed web crawler that uses [nats.io](http://nats.io) to coordinate work, written in Go.
## Dependencies
* CouchDB 1.x (tested on 1.6.1)
* gnatsd## Building.
Build the two workers
```
go build -tags=fetcherExec -o fetcher fetcher.go && \
go build -tags=extractorExec -o extractor extractor.go
```### Setup
1. Setup couchdb with at least one admin user, you can follow the instructions [here](http://stackoverflow.com/a/6418670/309896)
2. create a file `.couchdb.json` and place it in your `$HOME` directorySample `.couchdb.json`
```
{
"user": "user-here",
"password": "super-secret-password",
"url": "http://localhost:5984/owl-crawler"
}```
3. create a file `.gnatsd.json` and place it in your `$HOME` directory
Sample `.gnatsd.json`
```
{
"URL": "nats://owlcrawler:[email protected]:4222"
}
```4. Start gnatsd with a user and password (use a config file, but for a quick test
you can pass parameters):```
~/gnatsd --user owlcrawler --pass natsd_password
```#### On terminal 1 run:
```
./extractor -logtostderr=true -v=3
```#### On terminal 2 run:
```
./fetcher -logtostderr=true -v=3
```#### On terminal 3 run:
```
cd webapp
go build && ./webapp -alsologtostderr=true
```#### On terminal 4 run:
```
cd webapp
grunt serve
```