Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oguzhan18/crt-scrapper
Scraper Library is a versatile tool for scraping data from web pages. It provides various events to customize and control the scraping process, allowing users to retrieve data efficiently and reliably.
https://github.com/oguzhan18/crt-scrapper
javascript-library library nestjs node-library scrape scraper scraper-library
Last synced: about 2 months ago
JSON representation
Scraper Library is a versatile tool for scraping data from web pages. It provides various events to customize and control the scraping process, allowing users to retrieve data efficiently and reliably.
- Host: GitHub
- URL: https://github.com/oguzhan18/crt-scrapper
- Owner: oguzhan18
- Created: 2024-05-10T17:29:23.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2024-05-16T18:07:25.000Z (8 months ago)
- Last Synced: 2024-05-17T20:27:11.796Z (8 months ago)
- Topics: javascript-library, library, nestjs, node-library, scrape, scraper, scraper-library
- Language: JavaScript
- Homepage: https://crt-scrapper-docs.vercel.app/#/
- Size: 8.79 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe.md
Awesome Lists containing this project
README
# CRT Scrapper Library (1.0.4)
## Overview
CRT Scrapper Library is a versatile and easy-to-use tool for scraping data from web pages. It provides a robust and flexible framework for performing web scraping tasks on backend technologies such as Node.js and NestJS.## Features
- Scrapes data from web pages based on provided URL and target class
- Supports optional configurations for scraping process
- Provides events for handling various stages of scraping process, including before request, after request, error handling, and more
- Retry logic for handling temporary network issues
- Timeout configuration to prevent long-running requests
- Customizable HTTP headers for requests## Installation
To install CRT Scrapper Library, run the following command:```bash
npm install crt-scrapper
````## Usage
```javascript
const { scrapeData } = require('crt-scrapper');async function getDataFromUrl(url, targetClass) {
try {
const result = await scrapeData(url, targetClass, {
retry: 3,
timeout: 10000,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
},
beforeRequest: (url) => {
console.log(`Sending request to ${url}`);
},
afterRequest: (response) => {
console.log(`Received response with status code ${response.status}`);
},
onError: (error) => {
console.error('An error occurred during scraping:', error);
},
beforeParse: ($) => {
console.log('Parsing HTML content...');
},
afterParse: (data) => {
console.log('Scraping completed successfully:', data);
},
});console.log('Scraped data:', result.data);
} catch (error) {
console.error('Failed to scrape data:', error);
}
}/* It will be sufficient if you provide the website you will scrape and the HTML class whose data you want to get.
*/
getDataFromUrl('https://example.com', '.content');```
## Documentation
For detailed documentation and API reference, please refer to the API Documentation file.
## License
Web Scraper Library is licensed under the MIT License. See the LICENSE file for details.## Contribution
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.