Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shenfe/puppeteer-service
🎠 Run headless Chrome (aka Puppeteer) as a service.
https://github.com/shenfe/puppeteer-service
headless-chrome puppeteer puppeteer-service web-crawler
Last synced: 2 months ago
JSON representation
🎠 Run headless Chrome (aka Puppeteer) as a service.
- Host: GitHub
- URL: https://github.com/shenfe/puppeteer-service
- Owner: shenfe
- License: mit
- Created: 2017-10-18T02:52:51.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-03-16T12:43:11.000Z (almost 7 years ago)
- Last Synced: 2024-09-30T07:41:38.793Z (3 months ago)
- Topics: headless-chrome, puppeteer, puppeteer-service, web-crawler
- Language: JavaScript
- Homepage:
- Size: 133 KB
- Stars: 47
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
🎠 Run [GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer) as a service.
## Usage
### Server
```bash
$ npm install puppeteer-service --save
``````js
const PuppeteerService = require('puppeteer-service');
const { koaApp, server } = PuppeteerService({
cluster: true, // default: false
port: 3000, // default
api: 'run', // default
test: true, // default: false
puppeteer: {
// See https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions
headless: true, // default
args: ['--no-sandbox']
}
});
```😯 If the `test` option is set `true` like above, you can visit the test page via `http://your.host:3000/test/`.
### Client
#### 👉 Option 1: Use puppeteer-service-client
```bash
$ npm install puppeteer-service-client --save
```Use [puppeteer-service-client](https://github.com/shenfe/puppeteer-service-client) to communicate with the server. It's runnable at **both browser and Node.js**.
```js
const Run = require('puppeteer-service-client');
Run('http://your.host:3000/run', {
/* Entry page url */
url: 'https://target.com/',/* Runner function */
run: async page => {
const title = await page.title();
echo({ url: page.url(), title });
return {
info: b(a, title)
};
},/* Options (Optional) */
options: {
/* Variables to inject */
/* Identifiers and their corresponding literal values will be injected
as variable declarations into the runner function. */
injection: {
a: 'Welcome to ',
b: function (x, y) {
return x + y;
}
}
},/* WebSocket data handler (Optional) */
socket: data => {
/**/
}
})
.then(data => {
/**/
}).catch(error => {
/**/
});
```**socket and echo**
The `socket` option specifies a handler for WebSocket data at client side. Correspondingly, the function `echo`, which is callable inside the "page runner function", is a built-in function whose responsibility is to transfer data to the right socket connection with the client.
#### 👉 Option 2: Send a request directly
As the following does:
```js
const pageRunner = async page => {
const title = await page.title();
return {
info: b(a, title)
};
};
fetch('http://your.host:3000/run', {
method: 'POST',
/*...*/
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
data: `{
url: 'https://www.sogou.com',
run: ${pageRunner},
options: {
injection: {
a: 'Welcome to ',
b: function (x, y) {
return x + y;
}
}
}
}`
})
})
.then(res => {
if (res.ok) return res.json();
throw new Error('Response is not ok');
})
.then(data => {
/**/
}).catch(error => {
/**/
});
```⚠️ This way is lightweight but too simple to communicate with the server via WebSocket.
## Development
Some commands:
```bash
npm start # start
npm start -- -p 3000 # port
npm start -- -c # cluster
npm run debug # debugging mode
npm test # test
npm test -- -u http://127.0.0.1:3000/run # api url
npm test -- -n 10 # batch number
```## License
[MIT](http://opensource.org/licenses/MIT)
Copyright © 2018-present, [shenfe](https://github.com/shenfe)