https://github.com/joseortuno/sentinel-scraper
Scraper is a tool for scraping websites
https://github.com/joseortuno/sentinel-scraper
javascript scraping tool
Last synced: 11 months ago
JSON representation
Scraper is a tool for scraping websites
- Host: GitHub
- URL: https://github.com/joseortuno/sentinel-scraper
- Owner: joseortuno
- Created: 2020-01-31T19:40:15.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-03-04T05:52:29.000Z (over 3 years ago)
- Last Synced: 2023-03-08T11:13:00.677Z (over 3 years ago)
- Topics: javascript, scraping, tool
- Language: TypeScript
- Homepage:
- Size: 4.47 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scraper is a tool for scraping web througt of url and selectors
v. 2.0.0
## Usage
Call to scraping tool:
```javascript
const Scraper = require('sentinel-scraper');
```
Create a instance for scraping an url:
```javascript
const scraping = new Scraper('The url that do need scraping');
```
### Methods
#### 1. SELECTOR
To scrape sections of a url through its selectors.
```javascript
scraping.select(selector, expression); // It is necesary insert the parameters.
```
Parameters:
1. selector: behaves as a '`.querySlectorAll()`'.
2. expression (callback): currentValue, index (optional).
Run a method for scraping a page:
```javascript
const data = scraping.select('#selector', item => {
return item.children.item(0).href);
})
/* Output:
data = [
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
]
*/
// Return an array with format you need. For example:
const data = scraping.select('#selector', item => {
return {
title: item.children.item(0).textContent,
image: item.children.item(0).src,
url: item.children.item(0).href,
});
});
/* Output:
data = [
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
},
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
},
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
}
]
*/
// Or create data in format you need without return nothing. For example:
const data = {};
scraping.select('#selector', (item, index) => {
data[index] = [
item.children.item(0).textContent,
item.children.item(0).src,
item.children.item(0).href,
];
});
/* Output:
data = {
1: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
],
2: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
],
3: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
]
]
*/
```
#### 2. FOR
It is a static method for to scrape an array of urls. It is a factory of new Scraper();
```javascript
Scraper.for(urls, expression); // It is necesary insert the parameters.
```
Parameters:
1. urls: array of urls.
2. expression (callback): currentValue (instance of Scrape for url).
```javascript
const urls = [
'http//:www.example.com/product/1',
'http//:www.example.com/product/2',
'http//:www.example.com/product/3',
'http//:www.example.com/product/4',
'http//:www.example.com/product/5'
];
Scraper.for(urls, (scrape, index, url) => {
// the url return url parameter extracted
// for example: scrape.select();
});
````
The static method for we will use it when we want scrape different depths.