An open API service indexing awesome lists of open source software.

https://github.com/spider-rs/readability

The readability library for LLM's
https://github.com/spider-rs/readability

clean-data data-cleaning llm-training readability rust safari-reader

Last synced: 6 months ago
JSON representation

The readability library for LLM's

Awesome Lists containing this project

README

          

# llm_readability

The Rust readability library built for performance, AI, and multiple locales.
The library is used on [Spider Cloud](https://spider.cloud) for data cleaning.

## Usage

```toml
[dependencies]
llm_readability = "0"
```

```rust
use llm_readability::extractor;

fn main() {
match extractor::extract(&mut "...".as_bytes(), "https://example.com", None) {
Ok(product) => {
println!("------- html ------");
println!("{}", product.content);
println!("---- plain text ---");
println!("{}", product.text);
},
Err(_) => println!("error occured"),
}
}
```

This project is a rewrite of `readability-rs` for performance and bug fixes.