https://github.com/matovu-farid/scrap-npm-package

Scrap NPM Package is a lightweight TypeScript library designed to simplify asynchronous web scraping. It provides AI-powered data extraction capabilities and supports both Node.js and Deno environments. With features like customizable prompts, callback support, secure webhook verification, and optional structured data extraction using Zod schemas,
https://github.com/matovu-farid/scrap-npm-package

ai scrapping

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/matovu-farid/scrap-npm-package
Owner: matovu-farid
License: other
Created: 2025-02-09T10:04:40.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-03-03T12:10:41.000Z (3 months ago)
Last Synced: 2025-04-03T16:49:23.451Z (about 2 months ago)
Topics: ai, scrapping
Language: TypeScript
Homepage: https://scrapai.matovu-farid.com
Size: 1010 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Web Scraping Helper

A lightweight TypeScript library for asynchronous web scraping with customizable

prompts and callback support.

## Installation

```bash

# NPM

npm install scrap-ai

# Yarn

yarn add scrap-ai

# Deno

import { ScrapeClient } from "https://deno.land/x/scrap_ai/mod.ts";

```

Then import and use:

```typescript

// ESM/TypeScript

import { ScrapeClient } from "scrap-ai";

// CommonJS

const { ScrapeClient } = require("scrap-ai");

```

## Features

- 🤖 AI-powered data extraction

- 🔄 Asynchronous processing with callback support

- 🔒 Secure webhook verification

- 📦 TypeScript support

- 🌐 Cross-platform (Node.js and Deno)

## Usage

The library provides a `ScrapeClient` class for initiating web scraping

operations:

```typescript

import { ScrapeClient } from "scrap-ai";

// Initialize the client with your API key

const scrapeClient = new ScrapeClient(process.env.SCRAP_API_KEY);

// Basic scraping

await scrapeClient.scrape(

  "https://example.com",

  "Extract all product titles and prices",

  "https://your-api.com/webhook"

);

// Scraping with custom ID

await scrapeClient.scrape(

  "https://example.com",

  "Extract product information",

  "https://your-api.com/webhook",

  "optional-custom-id"

);

```

## API Reference

### new ScrapeClient(apiKey)

Creates a new scraping client instance.

#### Parameters

| Parameter | Type   | Description                     |

| --------- | ------ | ------------------------------- |

| apiKey    | string | Your API key for authentication |

### scrapeClient.scrape(url, prompt, callbackUrl, id?)

Initiates a scraping operation and sends results to the specified callback URL

upon completion.

#### Parameters

| Parameter   | Type   | Description                                         |

| ----------- | ------ | --------------------------------------------------- |

| url         | string | The URL of the webpage to scrape                    |

| prompt      | string | Instructions for what data to extract               |

| callbackUrl | string | URL where results will be sent via POST             |

| id?         | string | Optional custom identifier for the scraping request |

### Webhook Verification

The library provides webhook verification to ensure the authenticity of incoming

webhook requests:

```typescript

const isValid = scrapeClient.verifyWebhook({

  body: req.body,

  signature: req.headers["x-webhook-signature"],

  timestamp: req.headers["x-webhook-timestamp"],

});

```

### scrapeClient.verifyWebhook(options)

Verifies that a webhook request is authentic using timing-safe signature

comparison.

#### Parameters

| Parameter         | Type   | Description                                                 |

| ----------------- | ------ | ----------------------------------------------------------- |

| options.body      | Object | The raw request body as an object                           |

| options.signature | string | The signature from x-webhook-signature header               |

| options.timestamp | string | The timestamp from x-webhook-timestamp header               |

| options.maxAge?   | number | Maximum age of webhook in milliseconds (default: 5 minutes) |

### scrapeClient.parseWebhookBody(body)

Parses and validates the webhook body.

#### Parameters

| Parameter | Type   | Description                      |

| --------- | ------ | -------------------------------- |

| body      | string | The raw webhook body as a string |

Returns the parsed and validated webhook event.

## Example Usage with Express

Here's a complete example of how to use the scraping client with webhook

verification in an Express application:

```typescript

import { ScrapeClient } from "scrap-ai";

import express from "express";

const app = express();

const scrapeClient = new ScrapeClient(process.env.SCRAP_API_KEY);

// Webhook endpoint

app.post("/webhook", express.json(), (req, res) => {

  const isValid = scrapeClient.verifyWebhook({

    body: req.body,

    signature: req.headers["x-webhook-signature"] as string,

    timestamp: req.headers["x-webhook-timestamp"] as string,

  });

  if (!isValid) {

    return res.status(400).send("Invalid webhook signature");

  }

  const event = scrapeClient.parseWebhookBody(JSON.stringify(req.body));

  console.log("Received verified webhook:", event);

  res.status(200).send("OK");

});

// Start scraping

app.post("/start-scrape", async (req, res) => {

  try {

    const result = await scrapeClient.scrape(

      "https://example.com",

      "Extract product information",

      "https://your-api.com/webhook"

    );

    res.json(result);

  } catch (error) {

    res.status(500).json({ error: "Scraping failed" });

  }

});

```

## License

This project is licensed under the MIT License - see the LICENSE file for

details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

```

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/matovu-farid/scrap-npm-package

Awesome Lists containing this project

README