https://github.com/matovu-farid/scrap-npm-package
Scrap NPM Package is a lightweight TypeScript library designed to simplify asynchronous web scraping. It provides AI-powered data extraction capabilities and supports both Node.js and Deno environments. With features like customizable prompts, callback support, secure webhook verification, and optional structured data extraction using Zod schemas,
https://github.com/matovu-farid/scrap-npm-package
ai scrapping
Last synced: about 1 month ago
JSON representation
Scrap NPM Package is a lightweight TypeScript library designed to simplify asynchronous web scraping. It provides AI-powered data extraction capabilities and supports both Node.js and Deno environments. With features like customizable prompts, callback support, secure webhook verification, and optional structured data extraction using Zod schemas,
- Host: GitHub
- URL: https://github.com/matovu-farid/scrap-npm-package
- Owner: matovu-farid
- License: other
- Created: 2025-02-09T10:04:40.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T12:10:41.000Z (3 months ago)
- Last Synced: 2025-04-03T16:49:23.451Z (about 2 months ago)
- Topics: ai, scrapping
- Language: TypeScript
- Homepage: https://scrapai.matovu-farid.com
- Size: 1010 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web Scraping Helper
A lightweight TypeScript library for asynchronous web scraping with customizable
prompts and callback support.## Installation
```bash
# NPM
npm install scrap-ai# Yarn
yarn add scrap-ai# Deno
import { ScrapeClient } from "https://deno.land/x/scrap_ai/mod.ts";
```Then import and use:
```typescript
// ESM/TypeScript
import { ScrapeClient } from "scrap-ai";// CommonJS
const { ScrapeClient } = require("scrap-ai");
```## Features
- 🤖 AI-powered data extraction
- 🔄 Asynchronous processing with callback support
- 🔒 Secure webhook verification
- 📦 TypeScript support
- 🌐 Cross-platform (Node.js and Deno)## Usage
The library provides a `ScrapeClient` class for initiating web scraping
operations:```typescript
import { ScrapeClient } from "scrap-ai";// Initialize the client with your API key
const scrapeClient = new ScrapeClient(process.env.SCRAP_API_KEY);// Basic scraping
await scrapeClient.scrape(
"https://example.com",
"Extract all product titles and prices",
"https://your-api.com/webhook"
);// Scraping with custom ID
await scrapeClient.scrape(
"https://example.com",
"Extract product information",
"https://your-api.com/webhook",
"optional-custom-id"
);
```## API Reference
### new ScrapeClient(apiKey)
Creates a new scraping client instance.
#### Parameters
| Parameter | Type | Description |
| --------- | ------ | ------------------------------- |
| apiKey | string | Your API key for authentication |### scrapeClient.scrape(url, prompt, callbackUrl, id?)
Initiates a scraping operation and sends results to the specified callback URL
upon completion.#### Parameters
| Parameter | Type | Description |
| ----------- | ------ | --------------------------------------------------- |
| url | string | The URL of the webpage to scrape |
| prompt | string | Instructions for what data to extract |
| callbackUrl | string | URL where results will be sent via POST |
| id? | string | Optional custom identifier for the scraping request |### Webhook Verification
The library provides webhook verification to ensure the authenticity of incoming
webhook requests:```typescript
const isValid = scrapeClient.verifyWebhook({
body: req.body,
signature: req.headers["x-webhook-signature"],
timestamp: req.headers["x-webhook-timestamp"],
});
```### scrapeClient.verifyWebhook(options)
Verifies that a webhook request is authentic using timing-safe signature
comparison.#### Parameters
| Parameter | Type | Description |
| ----------------- | ------ | ----------------------------------------------------------- |
| options.body | Object | The raw request body as an object |
| options.signature | string | The signature from x-webhook-signature header |
| options.timestamp | string | The timestamp from x-webhook-timestamp header |
| options.maxAge? | number | Maximum age of webhook in milliseconds (default: 5 minutes) |### scrapeClient.parseWebhookBody(body)
Parses and validates the webhook body.
#### Parameters
| Parameter | Type | Description |
| --------- | ------ | -------------------------------- |
| body | string | The raw webhook body as a string |Returns the parsed and validated webhook event.
## Example Usage with Express
Here's a complete example of how to use the scraping client with webhook
verification in an Express application:```typescript
import { ScrapeClient } from "scrap-ai";
import express from "express";const app = express();
const scrapeClient = new ScrapeClient(process.env.SCRAP_API_KEY);// Webhook endpoint
app.post("/webhook", express.json(), (req, res) => {
const isValid = scrapeClient.verifyWebhook({
body: req.body,
signature: req.headers["x-webhook-signature"] as string,
timestamp: req.headers["x-webhook-timestamp"] as string,
});if (!isValid) {
return res.status(400).send("Invalid webhook signature");
}const event = scrapeClient.parseWebhookBody(JSON.stringify(req.body));
console.log("Received verified webhook:", event);res.status(200).send("OK");
});// Start scraping
app.post("/start-scrape", async (req, res) => {
try {
const result = await scrapeClient.scrape(
"https://example.com",
"Extract product information",
"https://your-api.com/webhook"
);
res.json(result);
} catch (error) {
res.status(500).json({ error: "Scraping failed" });
}
});
```## License
This project is licensed under the MIT License - see the LICENSE file for
details.## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
```
```