Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
https://github.com/vivekg13186/easy_web_crawler
crawler spider web
Last synced: 25 days ago
JSON representation
Web crawler around puppeteer to crawler ajax/java script enabled pages.
- Host: GitHub
- URL: https://github.com/vivekg13186/easy_web_crawler
- Owner: vivekg13186
- License: mit
- Created: 2018-07-03T11:19:28.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-10-11T18:29:44.000Z (about 6 years ago)
- Last Synced: 2024-11-08T15:07:53.431Z (about 2 months ago)
- Topics: crawler, spider, web
- Language: JavaScript
- Homepage:
- Size: 1.18 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# easy_web_crawler [![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/easy_web_crawler/Lobby)
Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use
# Features!
- Support crawling of javascript/ajax pages
- url filter
- avoid duplicate urls
- delay before page load
- custom data extraction
- build in spider
- stop and resume the crawling
- fast image download# Documentation
[Read full documentation here](https://vivekg13186.github.io/easy_web_crawler/1.0.5/)
### USAGE```
var Scraper = require("easy_web_crawler")async function main() {
var scraper = new Scraper();
scraper.startWithURLs("start_url")
scraper.allowIfMatches(function (url) { <> })
scraper.enableAutoCrawler(true)
scraper.saveProgressInFile("hello.db")
scraper.waitBetweenPageLoad(0)
scraper.callbackOnPageLoad(async function (page) {
<>
});
scraper.callbackOnFinish(function (result) {
console.log(JSON.stringify(result,null,4))
})
await scraper.start()
}main()
```
License
----MIT