An open API service indexing awesome lists of open source software.

https://github.com/ikenox/h2s-rs

A declarative and ergonomic HTML parser library in Rust
https://github.com/ikenox/h2s-rs

html parser rust

Last synced: 11 months ago
JSON representation

A declarative and ergonomic HTML parser library in Rust

Awesome Lists containing this project

README

          

[![Check](https://github.com/ikenox/h2s/actions/workflows/check.yml/badge.svg?branch=main)](https://github.com/ikenox/h2s/actions/workflows/check.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![Rustc Version 1.65+](https://img.shields.io/badge/rustc-1.65+-bc71d0.svg)

# h2s

A declarative HTML parser library in Rust, which works like a deserializer from HTML to struct.

## Example

```rust
use h2s::FromHtml;
use h2s::extraction_method::ExtractNthText;

#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Page {
#[h2s(attr = "lang")]
lang: String,
#[h2s(select = "div > h1.blog-title")]
blog_title: String,
#[h2s(select = ".articles > div")]
articles: Vec,
#[h2s(select = "body", extractor = ExtractNthText(1))]
footer2: String,
}

#[derive(FromHtml, Debug, Eq, PartialEq)]
pub struct Article {
#[h2s(select = "h2 > a")]
title: String,
#[h2s(select = "div > span")]
view_count: usize,
#[h2s(select = "h2 > a", attr = "href")]
url: String,
#[h2s(select = "ul > li")]
tags: Vec,
#[h2s(select = "ul > li:nth-child(1)")]
first_tag: Option,
}

let html = r#"


My tech blog




article1



901 Views


  • Tag1

  • Tag2




article2



849 Views




    article3



    103 Views

    • Tag3




    footer1


    footer2

    "#;

    let page = h2s::parse::(html).unwrap();

    assert_eq!(page, Page {
    lang: "en".to_string(),
    blog_title: "My tech blog".to_string(),
    articles: vec![
    Article {
    title: "article1".to_string(),
    url: "https://example.com/1".to_string(),
    view_count: 901,
    tags: vec!["Tag1".to_string(), "Tag2".to_string()],
    first_tag: Some("Tag1".to_string()),
    },
    Article {
    title: "article2".to_string(),
    url: "https://example.com/2".to_string(),
    view_count: 849,
    tags: vec![],
    first_tag: None,
    },
    Article {
    title: "article3".to_string(),
    url: "https://example.com/3".to_string(),
    view_count: 103,
    tags: vec!["Tag3".to_string()],
    first_tag: Some("Tag3".to_string()),
    },
    ],
    footer2: "footer2".to_string(),
    });

    // When the input HTML document structure does not match the expected,
    // `h2s::parse` will return an error with a detailed reason.
    let invalid_html = html.replace(r#"article3"#, "");
    let err = h2s::parse::(invalid_html).unwrap_err();
    assert_eq!(
    err.to_string(),
    "articles: [2]: title: mismatched number of selected elements by \"h2 > a\": expected exactly one element, but no elements found"
    );
    ```

    ## Supported types

    You can use the following types as a field value of the struct to parse.

    ### Basic types

    - `String`
    - Numeric types ( `usize`, `i64`, `NonZeroU32`, ... )
    - And more built-in supported types ([List](./core/src/parseable.rs))
    - Or you can use any types by implementing yourself ([Example](./examples/custom_field_value.rs))

    ### Container types (where `T` is a basic type)

    - `[T;N]`
    - `Option`
    - `Vec`

    ## License

    MIT