https://github.com/spumer/fastsoup

BeautifulSoup interface for lxml
https://github.com/spumer/fastsoup

beautifulsoup fast html lxml parser python3

Last synced: 4 months ago
JSON representation

BeautifulSoup interface for lxml

Host: GitHub
URL: https://github.com/spumer/fastsoup
Owner: spumer
License: gpl-3.0
Created: 2017-02-23T17:16:48.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2024-11-28T06:42:06.000Z (about 1 year ago)
Last Synced: 2025-07-13T02:03:05.537Z (7 months ago)
Topics: beautifulsoup, fast, html, lxml, parser, python3
Language: Python
Homepage:
Size: 101 KB
Stars: 4
Watchers: 3
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

          
FastSoup 

========

.. image:: https://travis-ci.org/spumer/FastSoup.svg

    :target: https://travis-ci.org/spumer/FastSoup

    :alt: Build Status

.. image:: https://coveralls.io/repos/github/spumer/FastSoup/badge.svg

    :target: https://coveralls.io/github/spumer/FastSoup

=====================================================================================================================================================

BeautifulSoup interface for lxml

Key features

============

* **FAST** search in tree

* **FAST** serialize to str

* BeautifulSoup4 interface to interact with object:

  * Search: ``find``\ , ``find_all``\ , ``find_next``\ , ``find_next_sibling``

  * Text: ``get_text``\ , ``string``

  * Tag: ``name``\ , ``get``\ , ``clear``\ , ``__getitem__``\ , ``__str__``, ``__repr__``, ``append``, ``new_tag``, ``extract``, ``replace_with``

Install

-------

.. code-block:: bash

   pip install fast-soup==1.1.0

How to use

----------

.. code-block:: python

   from fast_soup import FastSoup

   content = ...  # read some html content

   soup = FastSoup(content)

   # interact like BS4 object

   result = soup.find('a', id='my_link')

   # interact like lxml object

   el = result.unwrap()

FAQ

===

**Q:** BS4 already implement lxml parser. Why i should use FastSoup?

**A:** Yes, BS4 implement **parser**\ , and it's just building the tree. All next interactions proceed with "Python speed":

searching, serialization.

FastSoup internally use lxml and guarantee "C speed".

**Q:** How FastSoup speedup works?

**A:** FastSoup just build **xpath** and execute them. For prevent rebuilding LRU cache used.

**Q:** Why you don't support whole interface? This will be soon?

**A:** I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)

Miscellaneous

-------------

You can got power of BeautifulSoup when wrap your lxml objects, e.g:

.. code-block:: python

   from fast_soup import Tag

   content = ...  # some bytes ready to parse

   context = lxml.etree.iterparse(

       io.BytesIO(content),  ...

   )

   for event, elem in context:

       tag = Tag(elem)

       tag_text = tag.get_text()

       tag_attr = tag['attribute']

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spumer/fastsoup

Awesome Lists containing this project

README