Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/squallstar/xpath-articles-rules

A collection of xpath rules to extract relevant content from popular websites.
https://github.com/squallstar/xpath-articles-rules

Last synced: about 1 month ago
JSON representation

A collection of xpath rules to extract relevant content from popular websites.

Awesome Lists containing this project

README

        

# XPath Articles rules

A collection of xpath rules to extract relevant content from popular websites.

Here's an example about how to use it in PHP:

loadHTML($html);

$xpath = new DOMXPath($doc);

$image = $xpath->query($rules['common']['image])->item(0)->textContent;

$host = parse_uri($url)['host];
$article_nodes = $xpath->query($rules['hosts'][$host]['content]);

if ($article_nodes->length > 0)
{
$article_node = $article_nodes->item(0);
$article_text = $article_node->ownerDocument->saveHTML($article_node);
}

et voilà !