Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/squallstar/xpath-articles-rules
A collection of xpath rules to extract relevant content from popular websites.
https://github.com/squallstar/xpath-articles-rules
Last synced: about 1 month ago
JSON representation
A collection of xpath rules to extract relevant content from popular websites.
- Host: GitHub
- URL: https://github.com/squallstar/xpath-articles-rules
- Owner: squallstar
- Created: 2014-08-15T15:16:54.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-09-11T10:39:35.000Z (about 10 years ago)
- Last Synced: 2024-05-02T00:19:04.090Z (7 months ago)
- Homepage:
- Size: 156 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# XPath Articles rules
A collection of xpath rules to extract relevant content from popular websites.
Here's an example about how to use it in PHP:
loadHTML($html);
$xpath = new DOMXPath($doc);
$image = $xpath->query($rules['common']['image])->item(0)->textContent;
$host = parse_uri($url)['host];
$article_nodes = $xpath->query($rules['hosts'][$host]['content]);
if ($article_nodes->length > 0)
{
$article_node = $article_nodes->item(0);
$article_text = $article_node->ownerDocument->saveHTML($article_node);
}
et voilà !