Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/siristechnology/news-crawler
Config based news crawler using Google Puppeteer
https://github.com/siristechnology/news-crawler
chromium javascript news-crawler puppeteer
Last synced: 8 days ago
JSON representation
Config based news crawler using Google Puppeteer
- Host: GitHub
- URL: https://github.com/siristechnology/news-crawler
- Owner: siristechnology
- Created: 2020-08-19T02:23:39.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-05-15T19:49:38.000Z (over 3 years ago)
- Last Synced: 2024-11-12T08:53:52.099Z (about 1 month ago)
- Topics: chromium, javascript, news-crawler, puppeteer
- Language: JavaScript
- Homepage:
- Size: 215 KB
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# news-crawler
Config based news crawler using Google Puppeteer
- Uses `puppeteer-extra-plugin-adblocker` to block ads
- Uses `puppeteer-extra-plugin-stealth` to prevent detection
- Uses `html-to-text` to convert html to text## Install
yarn add news-crawler
## Sample code
const articles = await NewsCrawler(sourceConfig, { maxArticlesPerPage : 1, headless: false })
## Sample News Source Config
```
[
{
"name": "ekantipur",
"pages": [
{
"url": "https://ekantipur.com",
"category": "headlines",
"linkSelector": "article.normal > h1 > a"
}
],
"article-detail-selectors": {
"title": "main > article > header > h1",
"excerpt": "article .text-wrap > h2",
"leadImage": "#wrapper main article header figure img",
"content": [
"main article div.text-wrap p.description"
],
"tags": "",
"likes-count": "main > article > header div.total.shareTotal"
}
}
]
```## Sample News Output Json
```
[
{
source: 'ekantipur',
category: 'sports',
url: 'https://ekantipur.com/sports/2020/06/11/159183662731487753.html',
title: 'बायर्न जर्मनकप फाइनलमा',
leadImage: 'https://assets-cdn-usae.kantipurdaily.com/uploads/source/news/kantipur/2020/third-party/bayern-1162020024916-1000x0.jpg',
content: 'म्युनिख — बायर्न म्युनिखले कप डबलको उपलब्धि जीवन्त राख्न बुधबार राति आइनट्राख्ट फ्रान्कफर्टलाई २–१ ले हरायो र जर्मनकपको फाइनल'
}
]
```