https://github.com/spider-rs/readability
The readability library for LLM's
https://github.com/spider-rs/readability
clean-data data-cleaning llm-training readability rust safari-reader
Last synced: 6 months ago
JSON representation
The readability library for LLM's
- Host: GitHub
- URL: https://github.com/spider-rs/readability
- Owner: spider-rs
- License: mit
- Created: 2024-08-28T13:58:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-25T13:23:36.000Z (11 months ago)
- Last Synced: 2025-03-21T05:42:59.687Z (7 months ago)
- Topics: clean-data, data-cleaning, llm-training, readability, rust, safari-reader
- Language: Rust
- Homepage: https://crates.io/crates/llm_readability
- Size: 42 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llm_readability
The Rust readability library built for performance, AI, and multiple locales.
The library is used on [Spider Cloud](https://spider.cloud) for data cleaning.## Usage
```toml
[dependencies]
llm_readability = "0"
``````rust
use llm_readability::extractor;fn main() {
match extractor::extract(&mut "...".as_bytes(), "https://example.com", None) {
Ok(product) => {
println!("------- html ------");
println!("{}", product.content);
println!("---- plain text ---");
println!("{}", product.text);
},
Err(_) => println!("error occured"),
}
}
```This project is a rewrite of `readability-rs` for performance and bug fixes.