https://github.com/xop/news-scraper-directives
Directives for the NewScraper
https://github.com/xop/news-scraper-directives
Last synced: 4 months ago
JSON representation
Directives for the NewScraper
- Host: GitHub
- URL: https://github.com/xop/news-scraper-directives
- Owner: XOP
- License: mit
- Created: 2016-10-22T14:59:54.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-10-30T19:34:52.000Z (over 9 years ago)
- Last Synced: 2025-06-25T08:02:25.836Z (12 months ago)
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news-rus.yml
- License: LICENSE
Awesome Lists containing this project
README
# NewScraper Directives
> The directives repo for the NewScraper
> [https://github.com/XOP/news-scraper](https://github.com/XOP/news-scraper)
## Format
Directives are being digested by the [NewScraper application](https://github.com/XOP/news-scraper) and passed to the [NewScraper Core](https://github.com/XOP/news-scraper-core).
If you follow, the uniformity of the content is pretty self-evident.
Directives can be provided in both JSON and YML format.
The latter is used due to it's robust and descriptive nature.
Given the [template](_template.yml) here:
```
'NAME':
url: ''
elem: ''
link: ''
author: ''
time: ''
image: ''
limit: N
```
`NAME`
name of the resource, **required**
`url`
url of the resource, **required**
`elem`
CSS selector of the news item container element, **required**
`link`
CSS selector of the link (...) _inside_ of the `elem`
If the `elem` itself _is_ a link, this is not required
`author`
CSS selector of the author element _inside_ of the `elem`
`time`
CSS selector of the time element _inside_ of the `elem`
`image`
CSS selector of the image element _inside_ of the `elem`
This one can be `img` tag or any other - NewScraper will search for `data-src` and `background-image` CSS properties to find proper image data
`limit`
how many `elem`-s from the `url` will be scraped, maximum
To pass the collection of resources simply add empty line between them.
See [examples](blogs.yml).
```
'NAME 1':
url: '...'
elem: '...'
'NAME 2':
url: '...'
elem: '...'
...
'NAME N':
url: '...'
elem: '...'
```
## [MIT License](LICENSE)