{"id":35298255,"url":"https://github.com/niklak/dom_finder","last_synced_at":"2026-04-11T05:03:05.929Z","repository":{"id":216480335,"uuid":"741462313","full_name":"niklak/dom_finder","owner":"niklak","description":"HTML parsing with CSS seletors","archived":false,"fork":false,"pushed_at":"2026-03-14T10:38:03.000Z","size":140,"stargazers_count":2,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-14T21:11:00.105Z","etag":null,"topics":["css","css-selectors","html","html5ever","parser","scraping"],"latest_commit_sha":null,"homepage":"https://docs.rs/dom_finder","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/niklak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-01-10T12:55:26.000Z","updated_at":"2026-03-14T10:37:57.000Z","dependencies_parsed_at":"2024-04-09T09:27:43.756Z","dependency_job_id":"072f22a5-7217-4b4c-9682-8cb810418aa0","html_url":"https://github.com/niklak/dom_finder","commit_stats":null,"previous_names":["niklak/dom_finder"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/niklak/dom_finder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklak%2Fdom_finder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklak%2Fdom_finder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklak%2Fdom_finder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklak%2Fdom_finder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/niklak","download_url":"https://codeload.github.com/niklak/dom_finder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklak%2Fdom_finder/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31669117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T17:19:37.612Z","status":"online","status_checked_at":"2026-04-11T02:00:05.776Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","css-selectors","html","html5ever","parser","scraping"],"created_at":"2025-12-30T16:40:27.079Z","updated_at":"2026-04-11T05:03:05.924Z","avatar_url":"https://github.com/niklak.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# DOM_FINDER\n\n[![Crates.io version](https://img.shields.io/crates/v/dom_finder.svg?style=flat)](https://crates.io/crates/dom_finder)\n[![Download](https://img.shields.io/crates/d/dom_finder.svg?style=flat)](https://crates.io/crates/dom_finder)\n[![docs.rs docs](https://img.shields.io/badge/docs-latest-blue.svg?style=flat)](https://docs.rs/dom_finder)\n[![ci](https://github.com/niklak/dom_finder/actions/workflows/rust.yml/badge.svg)](https://github.com/niklak/dom_finder/actions/workflows/rust.yml)\n[![codecov](https://codecov.io/gh/niklak/dom_finder/graph/badge.svg?token=MFMJPTYVWT)](https://codecov.io/gh/niklak/dom_finder)\n\n`dom_finder` is a Rust crate that provides functionality for finding elements in the Document Object Model (DOM) of HTML documents. \nIt allows you to easily locate specific elements based on various CSS criteria. \nWith `dom_finder`, you can extract data from HTML documents and transform it before getting the result.\n\nCurrently, the functionality relies on YAML configuration.\n\n\n## Examples\n\n#### General\n\n```rust\n\nuse dom_finder::{Config, Finder, Value};\n\nconst CFG_YAML: \u0026str = r\"\nname: root\nbase_path: html\nchildren:\n  - name: results\n    base_path: div.serp__results div.result\n    many: true\n    children:\n      - name: url\n        base_path: h2.result__title \u003e a[href]\n        extract: href\n      - name: title\n        base_path: h2.result__title\n        extract: text\n      - name: snippet\n        base_path: a.result__snippet\n        extract: html\n        sanitize_policy: highlight\n        pipeline: [ [ normalize_spaces ] ]\n\";\n\nconst HTML_DOC: \u0026str = include_str!(\"../test_data/page_0.html\");\n\n\nfn main() {\n    // Loading config from yaml str, -- \u0026str can be retrieved from file or buffer,\n    let cfg = Config::from_yaml(CFG_YAML).unwrap();\n    // Creating a new Finder instance\n    let finder = Finder::new(\u0026cfg).unwrap();\n\n    // or in one line:\n    // let finder: Finder = Config::from_yaml(CFG_YAML).unwrap().try_into().unwrap();\n    \n    // parsing html-string (actually \u0026str), and getting the result as `Value`.\n    // Returned `Value` from `parse` method is always `Value::Object` and it has only one key (String).\n    let results: Value = finder.parse(HTML_DOC);\n\n    // from the `Value` we can navigate to descendant (inline) value, by path,\n    // similar like `gjson` has, but in `Value` case -- path is primitive.\n    // For more examples, please check out the `tests/` folder.\n\n    // Getting the count of results by using `from_path` method.\n    // We know that `results` is `Value::Array`, \n    // because in the config we set `many: true` for `results`.\n    // if the Value option is Array (actually Vector), we can query it by: # or a (positive) number.\n    let count_opt: Option\u003ci64\u003e = results.from_path(\"root.results.#\").and_then(|v| v.into());\n    assert_eq!(count_opt.unwrap(), 21);\n\n\n    // Getting an exact Value, and casting it to a real value\n    // Same way we can retrieve all urls inside `results` array, \n    // by specifying path as `root.results.#.url`.\n    // If there will no `url` key, or it will not have a Value::String type, \n    // it will return None, otherwise -- Some\n    let url: String = results.from_path(\"root.results.0.url\")\n    .and_then(| v| v.into()).unwrap();\n    assert_eq!(url, \"https://ethereum.org/en/\");\n\n    // Also the `Value` instance can be serialized with serde serializer \n    // (like json or any other available)\n    // Useful if you just need to send parsed data with http response, \n    // or put parsed data into the database\n    let serialized = serde_json::to_string(\u0026res).unwrap();\n}\n```\n\n#### Remove selection\n\n```rust\nuse dom_finder::{Config, Finder};\nuse dom_query::Document;\n\n\nconst HTML_DOC: \u0026str = include_str!(\"../test_data/page_0.html\");\n\n\nfn main() {\n\n  // Create finder, like in previous example\n  let cfg_yaml = r\"\n  name: root\n  base_path: html\n  children:\n  - name: feedback\n    base_path: div#links.results div.feedback-btn\n    extract: html\n    remove_selection: true\n    pipeline: [ [ trim_space ] ]\n  \";\n  let cfg = Config::from_yaml(cfg_yaml).unwrap();\n  let finder = Finder::new(\u0026cfg).unwrap();\n\n  // Create dom_query::Document\n  let doc = Document::from(HTML_DOC);\n\n  // Parse the document\n  // As we set remove_selection it matched selection will be removed from the document.\n  // But the value of matched selection will be available in the result\n  let res = finder.parse_document(\u0026doc);\n  let feedback_caption: Option\u003cString\u003e = res.from_path(\"root.feedback\").and_then(|v| v.into());\n  assert_eq!(feedback_caption.unwrap(), \"Feedback\");\n\n  let html = doc.html();\n  // html document doesn't contain feedback button anymore. \n  assert!(!html.contains(\"feedback-btn\"));\n}\n\n```\n\n#### More examples\n\n- [examples/multithread_scope.rs](./examples/multithread_scope.rs)\n- [examples/multithread.rs](./examples/multithread.rs)\n \n\n## Features\n\n- `json_cfg` -- optional, allow to load config from JSON string.\n\n## License\n\nThis project is licensed under either of\n\n * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or\n   https://www.apache.org/licenses/LICENSE-2.0)\n * MIT license ([LICENSE-MIT](LICENSE-MIT) or\n   https://opensource.org/licenses/MIT)\n\nat your option.\n\n Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniklak%2Fdom_finder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniklak%2Fdom_finder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniklak%2Fdom_finder/lists"}