Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zytedata/zyte-smartproxy-playwright
A wrapper over Playwright to provide Zyte Smart Proxy Manager specific functionalities.
https://github.com/zytedata/zyte-smartproxy-playwright
Last synced: about 2 months ago
JSON representation
A wrapper over Playwright to provide Zyte Smart Proxy Manager specific functionalities.
- Host: GitHub
- URL: https://github.com/zytedata/zyte-smartproxy-playwright
- Owner: zytedata
- License: mit
- Created: 2021-12-31T16:46:32.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-02T02:43:11.000Z (about 2 years ago)
- Last Synced: 2024-09-17T03:16:26.233Z (3 months ago)
- Language: JavaScript
- Size: 53.7 KB
- Stars: 6
- Watchers: 8
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Zyte SmartProxy Playwright
[![made-with-javascript](https://img.shields.io/badge/Made%20with-JavaScript-1f425f.svg)](https://www.javascript.com)
[![npm](https://img.shields.io/npm/v/zyte-smartproxy-playwright)](https://www.npmjs.com/package/zyte-smartproxy-playwright)Use [Playwright](https://playwright.dev) with
[Smart Proxy Manager](https://www.zyte.com/smart-proxy-manager/) easily!A wrapper over Playwright to provide Zyte Smart Proxy Manager specific functionalities.
## QuickStart
1. **Install Zyte SmartProxy Playwright**
```
npm install zyte-smartproxy-playwright
```2. **Create a file `sample.js` with following content and replace `` with your SPM Apikey**
``` javascript
const { chromium } = require('zyte-smartproxy-playwright'); // Or 'firefox' or 'webkit'(async () => {
const browser = await chromium.launch({
spm_apikey: '',
headless: false,
static_bypass: false, // enable to save bandwidth (but may break some websites)
block_ads: false, // enable to save bandwidth (but may break some websites)
});
console.log('Before new page');
const page = await browser.newPage({ignoreHTTPSErrors: true});console.log('Opening page ...');
try {
await page.goto('https://toscrape.com/', {timeout: 180000});
} catch(err) {
console.log(err);
}console.log('Taking a screenshot ...');
await page.screenshot({path: 'screenshot.png'});
await browser.close();
})();
```Make sure that you're able to make `https` requests using Smart Proxy Manager by following this guide [Fetching HTTPS pages with Zyte Smart Proxy Manager](https://docs.zyte.com/smart-proxy-manager/next-steps/fetching-https-pages-with-smart-proxy.html)
3. **Run `sample.js` using Node**
``` bash
node sample.js
```## API
`launch` accepts all the arguments accepted by `firefox.launch` or `launch` methods of other browser types
and some additional arguments defined below:| Argument | Default Value | Description |
|----------|---------------|-------------|
| `spm_apikey` | `undefined` | Zyte Smart Proxy Manager API key that can be found on your zyte.com account. |
| `spm_host` | `http://proxy.zyte.com:8011` | Zyte Smart Proxy Manager proxy host. |
| `static_bypass` | `true` | When `true` Zyte SmartProxy Playwright will skip proxy use for static assets defined by `static_bypass_regex` or pass `false` to use proxy. |
| `static_bypass_regex` | `/.*?\.(?:txt\|json\|css\|less\|gif\|ico\|jpe?g\|svg\|png\|webp\|mkv\|mp4\|mpe?g\|webm\|eot\|ttf\|woff2?)$/` | Regex to use filtering URLs for `static_bypass`. |
| `block_ads` | `true` | When `true` Zyte SmartProxy Playwright will block ads defined by `block_list` using `@cliqz/adblocker-playwright` package. |
| `block_list` | `['https://secure.fanboy.co.nz/easylist.txt', 'https://secure.fanboy.co.nz/easyprivacy.txt']` | Block list to be used by Zyte SmartProxy Playwright in order to initiate blocker enginer using `@cliqz/adblocker-playwright` and block ads |
| `headers` | `{'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'}` | List of headers to be appended to requests |## Notes
- Some websites may not work with `block_ads` and `static_bypass` enabled (default). Try to disable them if you encounter any issues.- When using the `headless: true` mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using ['X-Crawlera-Profile': 'desktop'](https://docs.zyte.com/smart-proxy-manager.html#x-crawlera-profile) in that case:
``` javascript
const browser = await chromium.launch({
spm_apikey: '',
headless: true,
headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}
});
```- When connecting to a remote Chrome browser instance, it should be launched with these arguments:
```
--proxy-server=http://proxy.zyte.com:8011 --disable-site-isolation-trials
```- Consider our new [zyte-smartproxy-plugin](https://github.com/zytedata/zyte-smartproxy-plugin) for [playwright-extra](https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra)
and [puppeteer-extra](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra) frameworks.