https://github.com/easonzero/sthe
A library to provide an easy way to extract data from HTML.
https://github.com/easonzero/sthe
Last synced: 2 months ago
JSON representation
A library to provide an easy way to extract data from HTML.
- Host: GitHub
- URL: https://github.com/easonzero/sthe
- Owner: Easonzero
- Created: 2022-10-30T16:39:01.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-11-11T13:03:59.000Z (over 2 years ago)
- Last Synced: 2025-01-24T11:31:54.047Z (4 months ago)
- Language: Rust
- Size: 192 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# STHE
A library to provide an easy way to extract data from HTML.## Example
```rust
// build extract option by toml
let opt: ExtractOpt = toml::from_str(r#"
target = "href"
selector = "a"
"#).unwrap();// extract
let extract = extract_fragment("", &opt.compile().unwrap());// serialize result
let extract_value = toml::Value::try_from(extract).unwrap();
let expect_value = toml::from_str("text = \"www.xxx.com\"").unwrap();assert_eq!(extract_value, expect_value);
```see also [examples/crawler.rs](examples/crawler.rs), run by `cargo run --example crawler -- -c examples/opt.toml`.