https://github.com/xop/news-scraper-directives
Directives for the NewScraper
https://github.com/xop/news-scraper-directives
Last synced: 2 months ago
JSON representation
Directives for the NewScraper
- Host: GitHub
- URL: https://github.com/xop/news-scraper-directives
- Owner: XOP
- License: mit
- Created: 2016-10-22T14:59:54.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-10-30T19:34:52.000Z (over 8 years ago)
- Last Synced: 2025-01-07T01:50:28.387Z (4 months ago)
- Size: 4.88 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news-rus.yml
- License: LICENSE
Awesome Lists containing this project
README
# NewScraper Directives
> The directives repo for the NewScraper
> [https://github.com/XOP/news-scraper](https://github.com/XOP/news-scraper)## Format
Directives are being digested by the [NewScraper application](https://github.com/XOP/news-scraper) and passed to the [NewScraper Core](https://github.com/XOP/news-scraper-core).
If you follow, the uniformity of the content is pretty self-evident.Directives can be provided in both JSON and YML format.
The latter is used due to it's robust and descriptive nature.Given the [template](_template.yml) here:
```
'NAME':
url: ''
elem: ''
link: ''
author: ''
time: ''
image: ''
limit: N
````NAME`
name of the resource, **required**`url`
url of the resource, **required**`elem`
CSS selector of the news item container element, **required**`link`
CSS selector of the link (...) _inside_ of the `elem`
If the `elem` itself _is_ a link, this is not required`author`
CSS selector of the author element _inside_ of the `elem``time`
CSS selector of the time element _inside_ of the `elem``image`
CSS selector of the image element _inside_ of the `elem`
This one can be `img` tag or any other - NewScraper will search for `data-src` and `background-image` CSS properties to find proper image data`limit`
how many `elem`-s from the `url` will be scraped, maximumTo pass the collection of resources simply add empty line between them.
See [examples](blogs.yml).```
'NAME 1':
url: '...'
elem: '...'
'NAME 2':
url: '...'
elem: '...'...
'NAME N':
url: '...'
elem: '...'
```## [MIT License](LICENSE)