An open API service indexing awesome lists of open source software.

https://github.com/spumer/fastsoup

BeautifulSoup interface for lxml
https://github.com/spumer/fastsoup

beautifulsoup fast html lxml parser python3

Last synced: 4 months ago
JSON representation

BeautifulSoup interface for lxml

Awesome Lists containing this project

README

          

FastSoup
========

.. image:: https://travis-ci.org/spumer/FastSoup.svg
:target: https://travis-ci.org/spumer/FastSoup
:alt: Build Status

.. image:: https://coveralls.io/repos/github/spumer/FastSoup/badge.svg
:target: https://coveralls.io/github/spumer/FastSoup

=====================================================================================================================================================

BeautifulSoup interface for lxml

Key features
============

* **FAST** search in tree
* **FAST** serialize to str
* BeautifulSoup4 interface to interact with object:

* Search: ``find``\ , ``find_all``\ , ``find_next``\ , ``find_next_sibling``
* Text: ``get_text``\ , ``string``
* Tag: ``name``\ , ``get``\ , ``clear``\ , ``__getitem__``\ , ``__str__``, ``__repr__``, ``append``, ``new_tag``, ``extract``, ``replace_with``

Install
-------

.. code-block:: bash

pip install fast-soup==1.1.0

How to use
----------

.. code-block:: python

from fast_soup import FastSoup

content = ... # read some html content
soup = FastSoup(content)

# interact like BS4 object
result = soup.find('a', id='my_link')

# interact like lxml object
el = result.unwrap()

FAQ
===

**Q:** BS4 already implement lxml parser. Why i should use FastSoup?

**A:** Yes, BS4 implement **parser**\ , and it's just building the tree. All next interactions proceed with "Python speed":
searching, serialization.
FastSoup internally use lxml and guarantee "C speed".

**Q:** How FastSoup speedup works?

**A:** FastSoup just build **xpath** and execute them. For prevent rebuilding LRU cache used.

**Q:** Why you don't support whole interface? This will be soon?

**A:** I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)

Miscellaneous
-------------

You can got power of BeautifulSoup when wrap your lxml objects, e.g:

.. code-block:: python

from fast_soup import Tag

content = ... # some bytes ready to parse
context = lxml.etree.iterparse(
io.BytesIO(content), ...
)
for event, elem in context:
tag = Tag(elem)

tag_text = tag.get_text()
tag_attr = tag['attribute']