An open API service indexing awesome lists of open source software.

https://github.com/r21gh/webcrawler

useful web crawler
https://github.com/r21gh/webcrawler

Last synced: 3 months ago
JSON representation

useful web crawler

Awesome Lists containing this project

README

        

# webcrawler

### Supporting items
#### 1) Downloading web page
###### 1-1) Retrying downloads
###### 1-2) Setting a user agent
#### 2) ID iteration
#### 3) Sitemap crawler
#### 4) Setting user-agent
#### 5) Parsing robots.txt
#### 6) Supporting proxies
#### 7) Avoid spider trap
#### 8) Throttling downloads