Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/uyar/piculet
Extract data from XML or HTML documents using XPath.
https://github.com/uyar/piculet
html scraping xml xpath
Last synced: 4 months ago
JSON representation
Extract data from XML or HTML documents using XPath.
- Host: GitHub
- URL: https://github.com/uyar/piculet
- Owner: uyar
- License: lgpl-3.0
- Created: 2018-05-29T18:58:29.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-06-25T19:32:40.000Z (over 1 year ago)
- Last Synced: 2024-10-07T05:47:38.232Z (4 months ago)
- Topics: html, scraping, xml, xpath
- Language: Python
- Homepage:
- Size: 790 KB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.txt
Awesome Lists containing this project
README
Piculet
=======Piculet is a module for extracting data from XML or HTML documents
using XPath queries.
It consists of a `single source file`_ with no dependencies
other than the standard library.
If available, it will make use of the lxml package
for improved performance and better XPath support.Piculet is used for the parsers
of the `Cinemagoer `_ project... _single source file: https://github.com/uyar/piculet/blob/master/piculet.py
Getting started
---------------Piculet works with Python 3.8 and later versions.
You can install it using ``pip``::pip install piculet
Installing Piculet creates a script named ``piculet``
which can be used to invoke the command line interface::$ piculet -h
usage: piculet [-h] [--version] [--html] -s SPEC [document]For example, say you want to extract some data from the file `shining.html`_.
An example specification is given in `movie.json`_.
Download both of these files and run the command::$ piculet -s movie.json shining.html
.. _shining.html: https://github.com/uyar/piculet/blob/master/examples/shining.html
.. _movie.json: https://github.com/uyar/piculet/blob/master/examples/movie.jsonGetting help
------------The documentation is available on: https://piculet.readthedocs.io/
The source code can be obtained from: https://github.com/uyar/piculet
License
-------Copyright (C) 2014-2023 H. Turgut Uyar
Piculet is released under the LGPL license, version 3 or later.
Read the included `LICENSE.txt`_ file for details... _LICENSE.txt: https://github.com/uyar/piculet/blob/master/LICENSE.txt