https://github.com/scrapehero/selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
https://github.com/scrapehero/selectorlib

python scraping selectors web-scraping xpath

Last synced: 3 months ago
JSON representation

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Host: GitHub
URL: https://github.com/scrapehero/selectorlib
Owner: scrapehero
License: mit
Created: 2019-05-21T05:42:10.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-01-30T04:38:57.000Z (over 2 years ago)
Last Synced: 2024-03-25T20:54:05.128Z (over 1 year ago)
Topics: python, scraping, selectors, web-scraping, xpath
Language: HTML
Homepage:
Size: 341 KB
Stars: 62
Watchers: 4
Forks: 11
Open Issues: 6
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE

Awesome Lists containing this project

README

        ===========

selectorlib

===========

.. image:: https://img.shields.io/pypi/v/selectorlib.svg

        :target: https://pypi.python.org/pypi/selectorlib

.. image:: https://img.shields.io/travis/scrapehero/selectorlib.svg

        :target: https://travis-ci.org/scrapehero/selectorlib

.. image:: https://readthedocs.org/projects/selectorlib/badge/?version=latest

        :target: https://selectorlib.readthedocs.io/en/latest/?badge=latest

        :alt: Documentation Status

.. image:: https://pyup.io/repos/github/scrapehero/selectorlib/shield.svg

     :target: https://pyup.io/repos/github/scrapehero/selectorlib/

     :alt: Updates

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

* Free software: MIT license

* Documentation: https://selectorlib.readthedocs.io.

Example

--------

>>> from selectorlib import Extractor

>>> yaml_string = """

    title:

        css: "h1"

        type: Text

    link:

        css: "h2 a"

        type: Link

    """

>>> extractor = Extractor.from_yaml_string(yaml_string)

>>> html = """

    
Title

    Usage

        ¶

    

    """

>>> extractor.extract(html)

{'title': 'Title', 'link': 'http://test'}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scrapehero/selectorlib

Awesome Lists containing this project

README

Title

Usage
¶

https://github.com/scrapehero/selectorlib

Awesome Lists containing this project

README

Title

Usage ¶

Usage
¶