https://github.com/tawsbob/puppeteer-infinite-scroll

aabstraction to help scrape data from sites with infinite scroll
https://github.com/tawsbob/puppeteer-infinite-scroll

Last synced: 8 months ago
JSON representation

aabstraction to help scrape data from sites with infinite scroll

Host: GitHub
URL: https://github.com/tawsbob/puppeteer-infinite-scroll
Owner: tawsbob
License: mit
Created: 2018-06-06T13:28:24.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-03-03T14:50:30.000Z (over 5 years ago)
Last Synced: 2025-03-18T12:34:16.823Z (8 months ago)
Language: JavaScript
Size: 8.79 KB
Stars: 9
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # puppeter-infinite-scroll

Just a helper to scrape data in sites that use infinete scroll.

```

npm install puppeter-infinite-scroll

```

# The problem

in most of solution that I found use a timing to scroll down the webpage and evaluate what you need, but if the request or network slow down and take more time than defined in the code and then the scraper just fail.

# See working

```

npm run test

```

# How use?

```javascript

const puppeteerInfiniteScroll = require('./src/puppeter-infinite-scroll')

;(async ()=>{

try {

  const browser = new puppeteerInfiniteScroll()

  await browser.start()

  await browser.open({

    url: 'https://medium.com/search?q=python',

    endpoint: 'https://medium.com/search/posts?q',

    loadImages: false,

    onResponse: (res)=>{

      //console.log(res)

    },

    onScroll: ()=>{

      console.log(`onScroll ${browser.scrollCount}`)

    }

  })

} catch (e) {

  console.error(e)

}

})()

```

# async browser.start() = puppeteer.lauch(opts)

```javascript

    //params(opts)

    //default: { headless: false, devtools: true }

    await browser.start()

```

# async browser.open()

this method create a new page.  setViewport({ width: 1280, height: 926 }), setRequestInterception(true)

```javascript

    //params(opts)

    //default: { url, onResponse, onScroll, loadImages = true, endpoint }

    //url = 'https://medium.com/search?q=python' - url of the page to be loaded

    //endpoint = 'https://medium.com/search?q=python' - endpoint wich load content to page

    //loadImages = true - if you need to prevent to load images set to false

    //onResponse = (response)=>{ } - if you need do something with request object

    //onScroll = ()=>{} - trigged after every scroll

    await browser.open({

    url: 'https://medium.com/search?q=python',

    endpoint: 'https://medium.com/search/posts?q',

    loadImages: false,

    onResponse: (res)=>{

      //console.log(res)

    },

    onScroll: ()=>{

      console.log(`onScroll ${browser.scrollCount}`)

    }

  })

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tawsbob/puppeteer-infinite-scroll

Awesome Lists containing this project

README