Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/redapple/parslepy

Python implementation of the Parsley language for extracting structured data from web pages
https://github.com/redapple/parslepy

Last synced: 3 months ago
JSON representation

Python implementation of the Parsley language for extracting structured data from web pages

Awesome Lists containing this project

README

        

parslepy
========

[![Build Status](https://travis-ci.org/redapple/parslepy.png?branch=master)](https://travis-ci.org/redapple/parslepy)

*parslepy* (pronounced *"parsley-pie"*, */ˈpɑːslipaɪ/*) is a Python implementation
(built on top of [lxml](http://lxml.de) and [cssselect](https://github.com/SimonSapin/cssselect)) of the
[Parsley DSL](https://github.com/fizx/parsley)
for extracting structured data from web pages, as defined by Kyle Maxwell and Andrew Cantino
(see [Parsley's wiki](https://github.com/fizx/parsley/wiki) for more details and original C implementation).

Kudos to Kyle Maxwell (@fizx) for coming up with this smart and easy syntax to define extracting rules.

> Please note that this *Parsley DSL* is **NOT** the same as the Parsley parsing library at https://pypi.python.org/pypi/Parsley

Check out the [official docs](http://pythonhosted.org/parslepy) for more information on how to install
and use *parslepy*. There is also some useful information at the [parslepy Wiki](https://github.com/redapple/parslepy/wiki)

Here is an example of a parselet script that extracts the questions from StackOverflow first page:

{
"first_page_questions(//div[contains(@class,'question-summary')])": [{
"title": ".//h3/a",
"tags": "div.tags",
"votes": "div.votes div.mini-counts",
"views": "div.views div.mini-counts",
"answers": "div.status div.mini-counts"
}]
}

### Install

Install via pip with:

sudo pip install parslepy

Alternatively, you can install from the latest source code:

git clone https://github.com/redapple/parslepy.git
sudo python setup.py install

### Online Resources ###

* [Official Documentation](http://pythonhosted.org/parslepy)
* [Wiki with examples and tutorials](https://github.com/redapple/parslepy/wiki)
* [Parsley DSL](https://github.com/fizx/parsley)
* [JSON Structure details -- Parsley wiki](https://github.com/fizx/parsley/wiki/JSON-Structure)
* [Example Scrapy Spider using Parsley](http://snipplr.com/view/67016/parsley-spider/)
* [Parsley DSL on Hacker News](https://news.ycombinator.com/item?id=1585301)