https://github.com/oguzhan18/crt-scrapper

Scraper Library is a versatile tool for scraping data from web pages. It provides various events to customize and control the scraping process, allowing users to retrieve data efficiently and reliably.
https://github.com/oguzhan18/crt-scrapper

javascript-library library nestjs node-library scrape scraper scraper-library

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/oguzhan18/crt-scrapper
Owner: oguzhan18
Created: 2024-05-10T17:29:23.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-05-16T18:07:25.000Z (about 2 years ago)
Last Synced: 2025-03-21T11:50:27.282Z (over 1 year ago)
Topics: javascript-library, library, nestjs, node-library, scrape, scraper, scraper-library
Language: JavaScript
Homepage: https://crt-scrapper-docs.vercel.app/#/
Size: 8.79 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: ReadMe.md

Awesome Lists containing this project

README

          # CRT Scrapper Library (1.0.4)

## Overview

CRT Scrapper Library is a versatile and easy-to-use tool for scraping data from web pages. It provides a robust and flexible framework for performing web scraping tasks on backend technologies such as Node.js and NestJS.

## Features

- Scrapes data from web pages based on provided URL and target class

- Supports optional configurations for scraping process

- Provides events for handling various stages of scraping process, including before request, after request, error handling, and more

- Retry logic for handling temporary network issues

- Timeout configuration to prevent long-running requests

- Customizable HTTP headers for requests

## Installation

To install CRT Scrapper Library, run the following command:

```bash

npm install crt-scrapper

````

## Usage

```javascript

const { scrapeData } = require('crt-scrapper');

async function getDataFromUrl(url, targetClass) {

    try {

        const result = await scrapeData(url, targetClass, {

            retry: 3,

            timeout: 10000,

            headers: {

                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',

            },

            beforeRequest: (url) => {

                console.log(`Sending request to ${url}`);

            },

            afterRequest: (response) => {

                console.log(`Received response with status code ${response.status}`);

            },

            onError: (error) => {

                console.error('An error occurred during scraping:', error);

            },

            beforeParse: ($) => {

                console.log('Parsing HTML content...');

            },

            afterParse: (data) => {

                console.log('Scraping completed successfully:', data);

            },

        });

        console.log('Scraped data:', result.data);

    } catch (error) {

        console.error('Failed to scrape data:', error);

    }

}

/* It will be sufficient if you provide the website you will  scrape and the HTML class whose data you want to get.

*/

getDataFromUrl('https://example.com', '.content');

```

## Documentation

For detailed documentation and API reference, please refer to the API Documentation file.

## License

Web Scraper Library is licensed under the MIT License. See the LICENSE file for details.

## Contribution

Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oguzhan18/crt-scrapper

Awesome Lists containing this project

README