Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zytedata/zyte-smartproxy-plugin
A plugin for playwright-extra and puppeteer-extra to provide Smart Proxy Manager specific functionalities.
https://github.com/zytedata/zyte-smartproxy-plugin
playwright playwright-extra puppeteer puppeteer-extra
Last synced: 26 days ago
JSON representation
A plugin for playwright-extra and puppeteer-extra to provide Smart Proxy Manager specific functionalities.
- Host: GitHub
- URL: https://github.com/zytedata/zyte-smartproxy-plugin
- Owner: zytedata
- License: mit
- Created: 2022-08-23T10:43:40.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-28T11:08:07.000Z (about 2 years ago)
- Last Synced: 2024-09-26T21:23:53.775Z (about 1 month ago)
- Topics: playwright, playwright-extra, puppeteer, puppeteer-extra
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/zyte-smartproxy-plugin
- Size: 11.7 KB
- Stars: 3
- Watchers: 8
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Zyte SmartProxy Plugin
[![made-with-javascript](https://img.shields.io/badge/Made%20with-JavaScript-1f425f.svg)](https://www.javascript.com)
[![npm](https://img.shields.io/npm/v/zyte-smartproxy-puppeteer)](https://www.npmjs.com/package/zyte-smartproxy-plugin)A plugin for [playwright-extra](https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra)
and [puppeteer-extra](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra)
to provide [Smart Proxy Manager](https://www.zyte.com/smart-proxy-manager/) specific functionalities.## QuickStart for playwright-extra
1. **Install Zyte SmartProxy Plugin**```
npm install playwright playwright-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth @cliqz/adblocker-playwright
```2. **Create a file `sample.js` with following content and replace `` with your SPM Apikey**
``` javascript
// playwright-extra is a drop-in replacement for playwright,
// it augments the installed playwright with plugin functionality
const { chromium } = require('playwright-extra')// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
chromium.use(SmartProxyPlugin({
spm_apikey: '',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
chromium.use(StealthPlugin());// create adblocker to block all ads (saves bandwidth)
const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright');
const fetch = require('cross-fetch');// playwright usage as normal
(async () => {
const adBlocker = await PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch);const browser = await chromium.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});// uncomment to enable adBlocker (saves bandwidth but may break some websites)
// adBlocker.enableBlockingInPage(page);await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
```Make sure that you're able to make `https` requests using Smart Proxy Manager by following this guide [Fetching HTTPS pages with Zyte Smart Proxy Manager](https://docs.zyte.com/smart-proxy-manager/next-steps/fetching-https-pages-with-smart-proxy.html)
3. **Run `sample.js` using Node**
``` bash
node sample.js
```## QuickStart for puppeteer-extra
1. **Install Zyte SmartProxy Plugin**
```
npm install puppeteer puppeteer-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker
```2. **Create a file `sample.js` with following content and replace `` with your SPM Apikey**
``` javascript
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
puppeteer.use(SmartProxyPlugin({
spm_apikey: '',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());// uncomment to enable adblocker plugin (saves bandwidth but may break some websites)
// const AdBlockerPlugin = require('puppeteer-extra-plugin-adblocker');
// puppeteer.use(AdBlockerPlugin({blockTrackers: true}));// puppeteer usage as normal
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
```Make sure that you're able to make `https` requests using Smart Proxy Manager by following this guide [Fetching HTTPS pages with Zyte Smart Proxy Manager](https://docs.zyte.com/smart-proxy-manager/next-steps/fetching-https-pages-with-smart-proxy.html)
3. **Run `sample.js` using Node**
``` bash
node sample.js
```## Zyte SmartProxy Plugin arguments
| Argument | Default Value | Description |
|----------|---------------|-------------|
| `spm_apikey` | `undefined` | Zyte Smart Proxy Manager API key that can be found on your zyte.com account. |
| `spm_host` | `http://proxy.zyte.com:8011` | Zyte Smart Proxy Manager proxy host. |
| `static_bypass` | `true` | When `true` Zyte SmartProxy Plugin will skip proxy use (saves proxy bandwidth) for static assets defined by `static_bypass_regex` or pass `false` to use proxy. |
| `static_bypass_regex` | `/.*?\.(?:txt\|json\|css\|less\|gif\|ico\|jpe?g\|svg\|png\|webp\|mkv\|mp4\|mpe?g\|webm\|eot\|ttf\|woff2?)$/` | Regex to use filtering URLs for `static_bypass`. |
| `headers` | `{'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'}` | List of headers to be appended to requests |
| `spm_session_id` | `undefined` | When specified Zyte SmartProxy Plugin will use an existing Zyte Smart Proxy Manager [session](https://docs.zyte.com/smart-proxy-manager/sessions.html), otherwise a new session will be created. |## Notes
- Some websites may not work with AdBlocker or `static_bypass` enabled (default). Try to disable them if you encounter any issues.- When using `headless: true` mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using ['X-Crawlera-Profile': 'desktop'](https://docs.zyte.com/smart-proxy-manager.html#x-crawlera-profile) in that case:
``` javascript
puppeteer.use(SmartProxyPlugin({spm_apikey: '', headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}}));
```- When connecting to a remote Chrome browser instance, it should be launched with these arguments:
```
--proxy-server=http://proxy.zyte.com:8011 --disable-site-isolation-trials
```