Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
https://github.com/agenty/scrapingai
crawler crawling datascraping extract-data scraping webscraper webscraping
Last synced: 2 months ago
JSON representation
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
- Host: GitHub
- URL: https://github.com/agenty/scrapingai
- Owner: Agenty
- License: mit
- Created: 2023-10-09T04:07:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-08T04:58:41.000Z (about 1 year ago)
- Last Synced: 2024-11-19T19:09:23.375Z (2 months ago)
- Topics: crawler, crawling, datascraping, extract-data, scraping, webscraper, webscraping
- Language: TypeScript
- Homepage: https://agenty.com/
- Size: 209 KB
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scarapingai
[![version](https://img.shields.io/npm/v/scrapingai.svg)](https://www.npmjs.com/package/scrapingai)
[![license](https://img.shields.io/npm/l/scrapingai.svg)](https://www.npmjs.com/package/scrapingai)> Extract data from websites automatically with AI or build [web scraping agents](https://agenty.com/products/scraping-agent) for bulk URL scraping.
![Auto extract website data with AI](/assets/auto-extract-api.png)
## Installation
Install it via npm:
```
npm i scarapingai
```## Highlights
- Built-in residential proxies and captcha handling
- Smart ad blocker, popup blocker for better performance
- Accept cookie consent automatically to close cookie banners
- Compatible with Puppeteer, Playwright for browser automation and testing.
- Background jobs for bulk URL scraping with automatic retry & error handling.## Usage
Get your [api key from here](https://cloud.agenty.com/settings/apikeys)```
const agenty = new Agenty(API_KEY);
const data = await agenty.browser.extract("https://example.com");
console.log(data);
```### Extract
To auto-extract product, jobs listing, SEO meta data, schema JSON etc from given URL```
const data = await agenty.browser.extract("https://example.com");
console.log(data);
```### Scrape
To extract data from given CSS selector or custom jQuery function```
const data = await agenty.browser.scrape("https://example.com");
console.log(data);
```### Screenshot
To [capture a screenshot](https://agenty.com/tools/webpage-to-screenshot) for given URL```
const data = await agenty.browser.screenshot("https://example.com");
console.log(data);
```
To [convert webpage into PDF](https://agenty.com/tools/webpage-to-pdf).```
const data = await agenty.browser.pdf("https://example.com");
console.log(data);
```### Content
To get HTML content from a URL.```
const data = await agenty.browser.content("https://example.com");
console.log(data);
```## License
**scrapingai** is a project by [Agenty](https://agenty.com), released under the [MIT](https://github.com/Agenty/scrapingai/blob/main/LICENSE) License.