https://github.com/spumer/fastsoup
BeautifulSoup interface for lxml
https://github.com/spumer/fastsoup
beautifulsoup fast html lxml parser python3
Last synced: 4 months ago
JSON representation
BeautifulSoup interface for lxml
- Host: GitHub
- URL: https://github.com/spumer/fastsoup
- Owner: spumer
- License: gpl-3.0
- Created: 2017-02-23T17:16:48.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-11-28T06:42:06.000Z (about 1 year ago)
- Last Synced: 2025-07-13T02:03:05.537Z (7 months ago)
- Topics: beautifulsoup, fast, html, lxml, parser, python3
- Language: Python
- Homepage:
- Size: 101 KB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
FastSoup
========
.. image:: https://travis-ci.org/spumer/FastSoup.svg
:target: https://travis-ci.org/spumer/FastSoup
:alt: Build Status
.. image:: https://coveralls.io/repos/github/spumer/FastSoup/badge.svg
:target: https://coveralls.io/github/spumer/FastSoup
=====================================================================================================================================================
BeautifulSoup interface for lxml
Key features
============
* **FAST** search in tree
* **FAST** serialize to str
* BeautifulSoup4 interface to interact with object:
* Search: ``find``\ , ``find_all``\ , ``find_next``\ , ``find_next_sibling``
* Text: ``get_text``\ , ``string``
* Tag: ``name``\ , ``get``\ , ``clear``\ , ``__getitem__``\ , ``__str__``, ``__repr__``, ``append``, ``new_tag``, ``extract``, ``replace_with``
Install
-------
.. code-block:: bash
pip install fast-soup==1.1.0
How to use
----------
.. code-block:: python
from fast_soup import FastSoup
content = ... # read some html content
soup = FastSoup(content)
# interact like BS4 object
result = soup.find('a', id='my_link')
# interact like lxml object
el = result.unwrap()
FAQ
===
**Q:** BS4 already implement lxml parser. Why i should use FastSoup?
**A:** Yes, BS4 implement **parser**\ , and it's just building the tree. All next interactions proceed with "Python speed":
searching, serialization.
FastSoup internally use lxml and guarantee "C speed".
**Q:** How FastSoup speedup works?
**A:** FastSoup just build **xpath** and execute them. For prevent rebuilding LRU cache used.
**Q:** Why you don't support whole interface? This will be soon?
**A:** I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)
Miscellaneous
-------------
You can got power of BeautifulSoup when wrap your lxml objects, e.g:
.. code-block:: python
from fast_soup import Tag
content = ... # some bytes ready to parse
context = lxml.etree.iterparse(
io.BytesIO(content), ...
)
for event, elem in context:
tag = Tag(elem)
tag_text = tag.get_text()
tag_attr = tag['attribute']