https://github.com/ikenox/h2s-rs
A declarative and ergonomic HTML parser library in Rust
https://github.com/ikenox/h2s-rs
html parser rust
Last synced: 11 months ago
JSON representation
A declarative and ergonomic HTML parser library in Rust
- Host: GitHub
- URL: https://github.com/ikenox/h2s-rs
- Owner: ikenox
- License: mit
- Created: 2022-05-04T16:13:03.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-08-16T12:50:57.000Z (over 2 years ago)
- Last Synced: 2025-01-29T04:48:21.036Z (about 1 year ago)
- Topics: html, parser, rust
- Language: Rust
- Homepage:
- Size: 240 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/ikenox/h2s/actions/workflows/check.yml) [](https://opensource.org/licenses/MIT) 
# h2s
A declarative HTML parser library in Rust, which works like a deserializer from HTML to struct.
## Example
```rust
use h2s::FromHtml;
use h2s::extraction_method::ExtractNthText;
#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Page {
#[h2s(attr = "lang")]
lang: String,
#[h2s(select = "div > h1.blog-title")]
blog_title: String,
#[h2s(select = ".articles > div")]
articles: Vec,
#[h2s(select = "body", extractor = ExtractNthText(1))]
footer2: String,
}
#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Article {
#[h2s(select = "h2 > a")]
title: String,
#[h2s(select = "div > span")]
view_count: usize,
#[h2s(select = "h2 > a", attr = "href")]
url: String,
#[h2s(select = "ul > li")]
tags: Vec,
#[h2s(select = "ul > li:nth-child(1)")]
first_tag: Option,
}
let html = r#"
footer1
footer2
"#;
let page = h2s::parse::(html).unwrap();
assert_eq!(page, Page {
lang: "en".to_string(),
blog_title: "My tech blog".to_string(),
articles: vec![
Article {
title: "article1".to_string(),
url: "https://example.com/1".to_string(),
view_count: 901,
tags: vec!["Tag1".to_string(), "Tag2".to_string()],
first_tag: Some("Tag1".to_string()),
},
Article {
title: "article2".to_string(),
url: "https://example.com/2".to_string(),
view_count: 849,
tags: vec![],
first_tag: None,
},
Article {
title: "article3".to_string(),
url: "https://example.com/3".to_string(),
view_count: 103,
tags: vec!["Tag3".to_string()],
first_tag: Some("Tag3".to_string()),
},
],
footer2: "footer2".to_string(),
});
// When the input HTML document structure does not match the expected,
// `h2s::parse` will return an error with a detailed reason.
let invalid_html = html.replace(r#"article3"#, "");
let err = h2s::parse::(invalid_html).unwrap_err();
assert_eq!(
err.to_string(),
"articles: [2]: title: mismatched number of selected elements by \"h2 > a\": expected exactly one element, but no elements found"
);
```
## Supported types
You can use the following types as a field value of the struct to parse.
### Basic types
- `String`
- Numeric types ( `usize`, `i64`, `NonZeroU32`, ... )
- And more built-in supported types ([List](./core/src/parseable.rs))
- Or you can use any types by implementing yourself ([Example](./examples/custom_field_value.rs))
### Container types (where `T` is a basic type)
- `[T;N]`
- `Option`
- `Vec`
## License
MIT