Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/uyar/piculet

Extract data from XML or HTML documents using XPath.
https://github.com/uyar/piculet

html scraping xml xpath

Last synced: 4 months ago
JSON representation

Extract data from XML or HTML documents using XPath.

Host: GitHub
URL: https://github.com/uyar/piculet
Owner: uyar
License: lgpl-3.0
Created: 2018-05-29T18:58:29.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-06-25T19:32:40.000Z (over 1 year ago)
Last Synced: 2024-10-07T05:47:38.232Z (4 months ago)
Topics: html, scraping, xml, xpath
Language: Python
Homepage:
Size: 790 KB
Stars: 5
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.txt

Awesome Lists containing this project

README

Piculet
=======

Piculet is a module for extracting data from XML or HTML documents
using XPath queries.
It consists of a `single source file`_ with no dependencies
other than the standard library.
If available, it will make use of the lxml package
for improved performance and better XPath support.

Piculet is used for the parsers
of the `Cinemagoer `_ project.

.. _single source file: https://github.com/uyar/piculet/blob/master/piculet.py

Getting started
---------------

Piculet works with Python 3.8 and later versions.
You can install it using ``pip``::

pip install piculet

Installing Piculet creates a script named ``piculet``
which can be used to invoke the command line interface::

$ piculet -h
usage: piculet [-h] [--version] [--html] -s SPEC [document]

For example, say you want to extract some data from the file `shining.html`_.
An example specification is given in `movie.json`_.
Download both of these files and run the command::

$ piculet -s movie.json shining.html

.. _shining.html: https://github.com/uyar/piculet/blob/master/examples/shining.html
.. _movie.json: https://github.com/uyar/piculet/blob/master/examples/movie.json

Getting help
------------

The documentation is available on: https://piculet.readthedocs.io/

The source code can be obtained from: https://github.com/uyar/piculet

License
-------

Piculet is released under the LGPL license, version 3 or later.
Read the included `LICENSE.txt`_ file for details.

.. _LICENSE.txt: https://github.com/uyar/piculet/blob/master/LICENSE.txt