https://github.com/fmpwizard/owlcrawler
Crawl the web using nats.io and Go
https://github.com/fmpwizard/owlcrawler
Last synced: 11 months ago
JSON representation
Crawl the web using nats.io and Go
- Host: GitHub
- URL: https://github.com/fmpwizard/owlcrawler
- Owner: fmpwizard
- License: apache-2.0
- Created: 2015-02-18T10:48:08.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-10-01T04:57:11.000Z (almost 11 years ago)
- Last Synced: 2024-11-15T10:42:37.840Z (over 1 year ago)
- Language: Go
- Homepage:
- Size: 393 KB
- Stars: 55
- Watchers: 9
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-mesos - OwlCrawler
README
# OwlCrawler
It's a distributed web crawler that uses [nats.io](http://nats.io) to coordinate work, written in Go.
## Dependencies
* CouchDB 1.x (tested on 1.6.1)
* gnatsd
## Building.
Build the two workers
```
go build -tags=fetcherExec -o fetcher fetcher.go && \
go build -tags=extractorExec -o extractor extractor.go
```
### Setup
1. Setup couchdb with at least one admin user, you can follow the instructions [here](http://stackoverflow.com/a/6418670/309896)
2. create a file `.couchdb.json` and place it in your `$HOME` directory
Sample `.couchdb.json`
```
{
"user": "user-here",
"password": "super-secret-password",
"url": "http://localhost:5984/owl-crawler"
}
```
3. create a file `.gnatsd.json` and place it in your `$HOME` directory
Sample `.gnatsd.json`
```
{
"URL": "nats://owlcrawler:natsd_password@127.0.0.1:4222"
}
```
4. Start gnatsd with a user and password (use a config file, but for a quick test
you can pass parameters):
```
~/gnatsd --user owlcrawler --pass natsd_password
```
#### On terminal 1 run:
```
./extractor -logtostderr=true -v=3
```
#### On terminal 2 run:
```
./fetcher -logtostderr=true -v=3
```
#### On terminal 3 run:
```
cd webapp
go build && ./webapp -alsologtostderr=true
```
#### On terminal 4 run:
```
cd webapp
grunt serve
```