https://github.com/spider-rs/spider_transformations
HTML transformation library for Rust
https://github.com/spider-rs/spider_transformations
html-to-markdown html-transformation transform-data
Last synced: 21 days ago
JSON representation
HTML transformation library for Rust
- Host: GitHub
- URL: https://github.com/spider-rs/spider_transformations
- Owner: spider-rs
- License: mit
- Created: 2025-06-07T19:35:10.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-06-07T20:12:54.000Z (8 months ago)
- Last Synced: 2025-06-07T20:27:39.142Z (8 months ago)
- Topics: html-to-markdown, html-transformation, transform-data
- Language: Rust
- Homepage: https://crates.io/crates/spider_transformations
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spider_transformations
A high-performance transformation library for Rust, used by [Spider Cloud](https://spider.cloud) for AI-powered content cleaning across multiple locales.
This project depends on the `spider` crate.
## Usage
```toml
[dependencies]
spider_transformations = "2"
```
```rust
use spider_transformations::transformation::content;
fn main() {
// page comes from the spider object when streaming.
let mut conf = content::TransformConfig::default();
conf.return_format = content::ReturnFormat::Markdown;
let content = content::transform_content(&page, &conf, &None, &None);
}
```
### Transform types
1. Markdown
1. Commonmark
1. Text
1. Markdown (Text Map) or HTML2Text
1. WIP: HTML2XML
#### Enhancements
1. Readability
1. Encoding
## Chunking
There are several chunking utils in the transformation mod.
This project has rewrites and forks of html2md, and html2text for performance and bug fixes.
## License
MIT