{"id":16278801,"url":"https://github.com/themaximalist/scrape.js","last_synced_at":"2025-06-14T15:33:31.080Z","repository":{"id":164887240,"uuid":"637239039","full_name":"themaximalist/scrape.js","owner":"themaximalist","description":"Web Scraping Library for Node.js","archived":false,"fork":false,"pushed_at":"2024-02-23T08:37:14.000Z","size":143,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-03T08:32:13.129Z","etag":null,"topics":["scraping","web","web-scraping"],"latest_commit_sha":null,"homepage":"https://scrapejs.themaximalist.com/","language":"CSS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/themaximalist.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-06T23:51:29.000Z","updated_at":"2024-08-03T09:50:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"8f2b010b-d197-417d-b016-40b83cf91830","html_url":"https://github.com/themaximalist/scrape.js","commit_stats":null,"previous_names":["themaximalist/scrape.js","themaximal1st/scrape.js"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/themaximalist/scrape.js","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themaximalist%2Fscrape.js","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themaximalist%2Fscrape.js/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themaximalist%2Fscrape.js/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themaximalist%2Fscrape.js/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/themaximalist","download_url":"https://codeload.github.com/themaximalist/scrape.js/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/themaximalist%2Fscrape.js/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259838142,"owners_count":22919526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scraping","web","web-scraping"],"created_at":"2024-10-10T19:00:03.848Z","updated_at":"2025-06-14T15:33:31.060Z","avatar_url":"https://github.com/themaximalist.png","language":"CSS","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Scrape.js\n\n\u003cimg src=\"public/logo.png\" alt=\"Scrape.js — Web Scraping Library for Node.js\" class=\"logo\" style=\"max-width: 400px\" /\u003e\n\n\u003cdiv class=\"badges\" style=\"text-align: center; margin-top: -10px;\"\u003e\n\u003ca href=\"https://github.com/themaximal1st/scrape.js\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/themaximal1st/scrape.js\"\u003e\u003c/a\u003e\n\u003ca href=\"https://www.npmjs.com/package/@themaximalist/scrape.js\"\u003e\u003cimg alt=\"NPM Downloads\" src=\"https://img.shields.io/npm/dt/%40themaximalist%2Fscrape.js\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/themaximal1st/scrape.js\"\u003e\u003cimg alt=\"GitHub code size in bytes\" src=\"https://img.shields.io/github/languages/code-size/themaximal1st/scrape.js\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/themaximal1st/scrape.js\"\u003e\u003cimg alt=\"GitHub License\" src=\"https://img.shields.io/github/license/themaximal1st/scrape.js\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cbr /\u003e\n\n`Scrape.js` is an easy to use web scraping library for Node.js.\n\n```javascript\nconst data = await scrape(\"https://example.com\");\n// { url, html }\n```\n\n**Features**\n\n* Fast\n* Scrape nearly any website\n* Headless JavaScript scraping\n* Auto proxy rotation\n* ...it just works\n* MIT License\n\n\n\n## Install\n\nInstall `Scrape.js` from NPM:\n\n```bash\nnpm install @themaximalist/scrape.js\n```\n\n## Config\n\n`Scrape.js` uses [Zen Rows](https://www.zenrows.com/) for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.\n\n```bash\nZENROWS_API_KEY=abcxyz123\n```\n\n`Scrape.js` can be used without proxies, but is less effective.\n\n\n## Usage\n\nUsing `Scrape.js` is as simple as calling a function with a website URL.\n\n```javascript\nconst scrape = require(\"@themaximalist/scrape.js\");\nawait scrape(\"http://example.com\"); // { url, html }\n```\n\nYou can specify additional options to `scrape()` for more control:\n\n```javascript\nconst data = await scrape(\"https://example.com\", {\n    headless: true,\n    proxy: true\n});\n// { url, html }\n```\n\n## API\n\nThe `Scrape.js` API is a simple function you call with your URL, with an optional config object.\n\n\n```javascript\nawait scrape(\n    url, // URL to scrape\n    {\n        headless: true, // Use JavaScript headless scraping\n        proxy: true, // Use proxy rotation\n        method: \"GET\", // HTTP Request method\n        timeout: 3000, // Scrape timeout in ms\n        userAgent: \"Mozilla/5.0...\", // User Agent\n    }\n);\n```\n\n**URL (required)**\n\n* **`url`** `\u003cstring\u003e`: URL to scrape\n\n**Options**\n\n* **`headless`** `\u003cbool\u003e`: Enable JavaScript. Default is `true`.\n* **`proxy`** `\u003cbool\u003e`: Use proxy with request. Default is `true`.\n* **`method`** `\u003cstring\u003e`: HTTP request method, usually `GET` or `POST`. Default is `GET`.\n* **`timeout`** `\u003cint\u003e`: Max request time in ms. Default is `3500`.\n* **`userAgent`** `\u003cstring\u003e`: User agent for request. Default is `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36`.\n\n**Response**\n\n`Scrape.js` returns an `object` containing the final `url` and `html` content.\n\n```javascript\nconst { url, html } = await scrape(\"https://example.com\");\nconsole.log(url); // https://example.com/\nconsole.log(html); // \u003chtml...\n```\n\nThe `Scrape.js` API is a simple and reliable way to scrape the HTML from any website.\n\n## Debug\n\n`Scrape.js` uses the `debug` npm module with the `scrape.js` namespace.\n\nView debug logs by setting the `DEBUG` environment variable.\n\n```bash\n\u003e DEBUG=scrape.js*\n\u003e node src/get_website_html.js\n# debug logs\n```\n\n\n## Examples\n\nView [tests](https://github.com/themaximal1st/scrape.js/tree/main/test) to examples on how to use `Scrape.js`.\n\n\n\n## Projects\n\n`Scrape.js` is currently used in the following projects:\n\n-   [News Score](https://newsscore.com) — score the news, score the news, rewrite the headlines\n\n\n\n## License\n\nMIT\n\n\n## Author\n\nCreated by [The Maximalist](https://twitter.com/themaximal1st), see our [open-source projects](https://themaximalist.com/products).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemaximalist%2Fscrape.js","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthemaximalist%2Fscrape.js","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthemaximalist%2Fscrape.js/lists"}