Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapehero/selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
https://github.com/scrapehero/selectorlib
python scraping selectors web-scraping xpath
Last synced: about 2 months ago
JSON representation
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
- Host: GitHub
- URL: https://github.com/scrapehero/selectorlib
- Owner: scrapehero
- License: mit
- Created: 2019-05-21T05:42:10.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-30T04:38:57.000Z (almost 2 years ago)
- Last Synced: 2024-03-25T20:54:05.128Z (9 months ago)
- Topics: python, scraping, selectors, web-scraping, xpath
- Language: HTML
- Homepage:
- Size: 341 KB
- Stars: 62
- Watchers: 4
- Forks: 11
- Open Issues: 6
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
===========
selectorlib
===========.. image:: https://img.shields.io/pypi/v/selectorlib.svg
:target: https://pypi.python.org/pypi/selectorlib.. image:: https://img.shields.io/travis/scrapehero/selectorlib.svg
:target: https://travis-ci.org/scrapehero/selectorlib.. image:: https://readthedocs.org/projects/selectorlib/badge/?version=latest
:target: https://selectorlib.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status.. image:: https://pyup.io/repos/github/scrapehero/selectorlib/shield.svg
:target: https://pyup.io/repos/github/scrapehero/selectorlib/
:alt: UpdatesA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
* Free software: MIT license
* Documentation: https://selectorlib.readthedocs.io.Example
-------->>> from selectorlib import Extractor
>>> yaml_string = """
title:
css: "h1"
type: Text
link:
css: "h2 a"
type: Link
"""
>>> extractor = Extractor.from_yaml_string(yaml_string)
>>> html = """
Title
Usage
¶
"""
>>> extractor.extract(html)
{'title': 'Title', 'link': 'http://test'}