Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.
https://github.com/vivekg13186/easy_web_crawler

crawler spider web

Last synced: 14 days ago
JSON representation

Web crawler around puppeteer to crawler ajax/java script enabled pages.

Awesome Lists containing this project

README

        

# easy_web_crawler [![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/easy_web_crawler/Lobby)

Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use

# Features!

- Support crawling of javascript/ajax pages
- url filter
- avoid duplicate urls
- delay before page load
- custom data extraction
- build in spider
- stop and resume the crawling
- fast image download

# Documentation
[Read full documentation here](https://vivekg13186.github.io/easy_web_crawler/1.0.5/)



### USAGE

```
var Scraper = require("easy_web_crawler")

async function main() {

var scraper = new Scraper();
scraper.startWithURLs("start_url")
scraper.allowIfMatches(function (url) { <> })
scraper.enableAutoCrawler(true)
scraper.saveProgressInFile("hello.db")
scraper.waitBetweenPageLoad(0)
scraper.callbackOnPageLoad(async function (page) {
<>
});
scraper.callbackOnFinish(function (result) {
console.log(JSON.stringify(result,null,4))
})
await scraper.start()
}

main()

```

License
----

MIT