Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chapsuk/crawler
web crawler
https://github.com/chapsuk/crawler
Last synced: 9 days ago
JSON representation
web crawler
- Host: GitHub
- URL: https://github.com/chapsuk/crawler
- Owner: chapsuk
- Created: 2016-09-03T19:48:02.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-10-08T13:51:13.000Z (over 8 years ago)
- Last Synced: 2024-11-10T20:36:27.836Z (2 months ago)
- Language: Go
- Size: 146 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Flags
```
-d string
db connections string (default "postgres://postgres:postgres@postgres/crawler?sslmode=disable")
-g bool
enable gzip (default true)
-h string
base endpoint (default "https://github.com/chapsuk")
-o string
output path (default "./result/")
-r bool
resume upload
-s bool
include subdomains
-w int
workers count (default 150)
```## crawler vs wget (without storage)
```bash
$ time wget -r -nv -A html,css,js http://golang-book.ru1.00 real
0.00 user
0.01 sys$ tree golang-book.ru/
golang-book.ru/
├── assets
│ └── main.css
├── chapter-01-gettting-started.html
├── chapter-02-your-first-program.html
├── chapter-03-types.html
├── chapter-04-variables.html
├── chapter-05-control-structures.html
├── chapter-06-arrays-slices-maps.html
├── chapter-07-functions.html
├── chapter-08-pointers.html
├── chapter-09-structs-and-interfaces.html
├── chapter-10-concurrency.html
├── chapter-11-packages.html
├── chapter-12-testing.html
├── chapter-13-core-packages.html
├── chapter-14-next-steps.html
└── index.html1 directory, 16 files
``````bash
$ time ./crawler -h http://golang-book.ru0.31 real
0.10 user
0.03 sys$ tree result/golang-book.ru/
result/golang-book.ru/
├── assets
│ └── main.css.gz
├── chapter-01-gettting-started.html.gz
├── chapter-02-your-first-program.html.gz
├── chapter-03-types.html.gz
├── chapter-04-variables.html.gz
├── chapter-05-control-structures.html.gz
├── chapter-06-arrays-slices-maps.html.gz
├── chapter-07-functions.html.gz
├── chapter-08-pointers.html.gz
├── chapter-09-structs-and-interfaces.html.gz
├── chapter-10-concurrency.html.gz
├── chapter-11-packages.html.gz
├── chapter-12-testing.html.gz
├── chapter-13-core-packages.html.gz
├── chapter-14-next-steps.html.gz
└── index.html.gz1 directory, 16 files
```without gzip
```bash
$ time ./crawler -h http://golang-book.ru -g=00.27 real
0.03 user
0.01 sys$ tree result/golang-book.ru/
result/golang-book.ru/
├── assets
│ └── main.css
├── chapter-01-gettting-started.html
├── chapter-02-your-first-program.html
├── chapter-03-types.html
├── chapter-04-variables.html
├── chapter-05-control-structures.html
├── chapter-06-arrays-slices-maps.html
├── chapter-07-functions.html
├── chapter-08-pointers.html
├── chapter-09-structs-and-interfaces.html
├── chapter-10-concurrency.html
├── chapter-11-packages.html
├── chapter-12-testing.html
├── chapter-13-core-packages.html
├── chapter-14-next-steps.html
└── index.html1 directory, 16 files
```with storage
```
$ time ./crawler -h http://golang-book.ru
2016/09/04 18:24:24 Completed! Time: 384.972384ms
0.39 real
0.11 user
0.03 sys
...
crawler=# select * from "golang-book.ru";
url | type | status | created | updated
--------------------------------------------------------------+------+--------+----------------------------+----------------------------
http://golang-book.ru | 1 | 3 | 2016-09-04 15:24:24.155024 | 2016-09-04 15:24:24.265518
http://golang-book.ru/chapter-01-gettting-started.html | 1 | 3 | 2016-09-04 15:24:24.25395 | 2016-09-04 15:24:24.321782
http://golang-book.ru/chapter-02-your-first-program.html | 1 | 3 | 2016-09-04 15:24:24.277237 | 2016-09-04 15:24:24.371623
http://golang-book.ru/chapter-03-types.html | 1 | 3 | 2016-09-04 15:24:24.280468 | 2016-09-04 15:24:24.388519
http://golang-book.ru/chapter-04-variables.html | 1 | 3 | 2016-09-04 15:24:24.281972 | 2016-09-04 15:24:24.39923
http://golang-book.ru/chapter-10-concurrency.html | 1 | 3 | 2016-09-04 15:24:24.291129 | 2016-09-04 15:24:24.423001
http://golang-book.ru/chapter-05-control-structures.html | 1 | 3 | 2016-09-04 15:24:24.283473 | 2016-09-04 15:24:24.433296
http://golang-book.ru/chapter-09-structs-and-interfaces.html | 1 | 3 | 2016-09-04 15:24:24.289616 | 2016-09-04 15:24:24.441851
http://golang-book.ru/assets/main.css | 2 | 3 | 2016-09-04 15:24:24.299716 | 2016-09-04 15:24:24.450094
http://golang-book.ru/chapter-07-functions.html | 1 | 3 | 2016-09-04 15:24:24.286585 | 2016-09-04 15:24:24.458055
http://golang-book.ru/chapter-11-packages.html | 1 | 3 | 2016-09-04 15:24:24.292562 | 2016-09-04 15:24:24.466465
http://golang-book.ru/chapter-08-pointers.html | 1 | 3 | 2016-09-04 15:24:24.288143 | 2016-09-04 15:24:24.474569
http://golang-book.ru/chapter-06-arrays-slices-maps.html | 1 | 3 | 2016-09-04 15:24:24.285088 | 2016-09-04 15:24:24.481355
http://golang-book.ru/chapter-14-next-steps.html | 1 | 3 | 2016-09-04 15:24:24.298188 | 2016-09-04 15:24:24.488203
http://golang-book.ru/chapter-12-testing.html | 1 | 3 | 2016-09-04 15:24:24.294398 | 2016-09-04 15:24:24.494819
http://golang-book.ru/chapter-13-core-packages.html | 1 | 3 | 2016-09-04 15:24:24.296463 | 2016-09-04 15:24:24.503753
http://golang-book.ru/ | 1 | 3 | 2016-09-04 15:24:24.316739 | 2016-09-04 15:24:24.511351
(17 rows)```
![](https://media.giphy.com/media/xTiTnKH3dDw1ww53R6/giphy.gif)