https://github.com/django-components/djc-core-html-parser
HTML parser used by django-components written in Rust
https://github.com/django-components/djc-core-html-parser
Last synced: about 1 month ago
JSON representation
HTML parser used by django-components written in Rust
- Host: GitHub
- URL: https://github.com/django-components/djc-core-html-parser
- Owner: django-components
- License: mit
- Created: 2025-01-23T18:07:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-10-01T22:25:24.000Z (about 1 month ago)
- Last Synced: 2025-10-02T15:26:11.892Z (about 1 month ago)
- Language: Rust
- Size: 53.7 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# djc-core-html-parser
[](https://pypi.org/project/djc-core-html-parser/) [](https://pypi.org/project/djc-core-html-parser/) [](https://github.com/django-components/djc-core-html-parser/blob/master/LICENSE/) [](https://pypistats.org/packages/djc-core-html-parser) [](https://github.com/django-components/djc-core-html-parser/actions/workflows/tests.yml)
HTML parser used by [django-components](https://github.com/django-components/django-components). Written in Rust, exposed as a Python package with [maturin](https://www.maturin.rs/).
This implementation was found to be 40-50x faster than our Python implementation, taking ~90ms to parse 5 MB of HTML.
## Installation
```sh
pip install djc-core-html-parser
```
## Usage
```python
from djc_core_html_parser import set_html_attributes
html = '
Hello
'
result, _ = set_html_attributes(
html,
# Add attributes to the root elements
root_attributes=['data-root-id'],
# Add attributes to all elements
all_attributes=['data-v-123'],
)
```
To save ourselves from re-parsing the HTML, `set_html_attributes` returns not just the transformed HTML, but also a dictionary as the second item.
This dictionary contains a record of which HTML attributes were written to which elemenents.
To populate this dictionary, you need set `watch_on_attribute` to an attribute name.
Then, during the HTML transformation, we check each element for this attribute. And if the element HAS this attribute, we:
1. Get the value of said attribute
2. Record the attributes that were added to the element, using the value of the watched attribute as the key.
```python
from djc_core_html_parser import set_html_attributes
html = """
Hello
"""
result, captured = set_html_attributes(
html,
# Add attributes to the root elements
root_attributes=['data-root-id'],
# Add attributes to all elements
all_attributes=['data-djc-tag'],
# Watch for this attribute on elements
watch_on_attribute='data-watch-id',
)
print(captured)
# {
# '123': ['data-root-id', 'data-djc-tag'],
# '456': ['data-djc-tag'],
# }
```
## Development
1. Setup python env
```sh
python -m venv .venv
```
2. Install dependencies
```sh
pip install -r requirements-dev.txt
```
The dev requirements also include `maturin` which is used packaging a Rust project
as Python package.
3. Install Rust
See https://www.rust-lang.org/tools/install
4. Run Rust tests
```sh
cargo test
```
5. Build the Python package
```sh
maturin develop
```
To build the production-optimized package, use `maturin develop --release`.
6. Run Python tests
```sh
pytest
```
> NOTE: When running Python tests, you need to run `maturin develop` first.
## Deployment
Deployment is done automatically via GitHub Actions.
To publish a new version of the package, you need to:
1. Bump the version in `pyproject.toml` and `Cargo.toml`
2. Open a PR and merge it to `main`.
3. Create a new tag on the `main` branch with the new version number (e.g. `v1.0.0`), or create a new release in the GitHub UI.