https://github.com/themaximalist/scrape.js
Web Scraping Library for Node.js
https://github.com/themaximalist/scrape.js
scraping web web-scraping
Last synced: about 1 year ago
JSON representation
Web Scraping Library for Node.js
- Host: GitHub
- URL: https://github.com/themaximalist/scrape.js
- Owner: themaximalist
- License: mit
- Created: 2023-05-06T23:51:29.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-23T08:37:14.000Z (over 2 years ago)
- Last Synced: 2025-06-03T08:32:13.129Z (about 1 year ago)
- Topics: scraping, web, web-scraping
- Language: CSS
- Homepage: https://scrapejs.themaximalist.com/
- Size: 140 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
## Scrape.js

`Scrape.js` is an easy to use web scraping library for Node.js.
```javascript
const data = await scrape("https://example.com");
// { url, html }
```
**Features**
* Fast
* Scrape nearly any website
* Headless JavaScript scraping
* Auto proxy rotation
* ...it just works
* MIT License
## Install
Install `Scrape.js` from NPM:
```bash
npm install @themaximalist/scrape.js
```
## Config
`Scrape.js` uses [Zen Rows](https://www.zenrows.com/) for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.
```bash
ZENROWS_API_KEY=abcxyz123
```
`Scrape.js` can be used without proxies, but is less effective.
## Usage
Using `Scrape.js` is as simple as calling a function with a website URL.
```javascript
const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }
```
You can specify additional options to `scrape()` for more control:
```javascript
const data = await scrape("https://example.com", {
headless: true,
proxy: true
});
// { url, html }
```
## API
The `Scrape.js` API is a simple function you call with your URL, with an optional config object.
```javascript
await scrape(
url, // URL to scrape
{
headless: true, // Use JavaScript headless scraping
proxy: true, // Use proxy rotation
method: "GET", // HTTP Request method
timeout: 3000, // Scrape timeout in ms
userAgent: "Mozilla/5.0...", // User Agent
}
);
```
**URL (required)**
* **`url`** ``: URL to scrape
**Options**
* **`headless`** ``: Enable JavaScript. Default is `true`.
* **`proxy`** ``: Use proxy with request. Default is `true`.
* **`method`** ``: HTTP request method, usually `GET` or `POST`. Default is `GET`.
* **`timeout`** ``: Max request time in ms. Default is `3500`.
* **`userAgent`** ``: User agent for request. Default is `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36`.
**Response**
`Scrape.js` returns an `object` containing the final `url` and `html` content.
```javascript
const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs
```
## Examples
View [tests](https://github.com/themaximal1st/scrape.js/tree/main/test) to examples on how to use `Scrape.js`.
## Projects
`Scrape.js` is currently used in the following projects:
- [News Score](https://newsscore.com) — score the news, score the news, rewrite the headlines
## License
MIT
## Author
Created by [The Maximalist](https://twitter.com/themaximal1st), see our [open-source projects](https://themaximalist.com/products).