{"id":43416277,"url":"https://github.com/James-LG/Skyscraper","last_synced_at":"2026-02-13T21:01:18.264Z","repository":{"id":40404599,"uuid":"368698794","full_name":"James-LG/Skyscraper","owner":"James-LG","description":"Rust library for scraping HTML using XPath expressions","archived":false,"fork":false,"pushed_at":"2024-12-07T02:57:25.000Z","size":812,"stargazers_count":31,"open_issues_count":8,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-07T03:25:11.479Z","etag":null,"topics":["html","rust","scraper","xpath"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/James-LG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-19T00:16:18.000Z","updated_at":"2024-09-08T23:18:29.000Z","dependencies_parsed_at":"2023-10-16T15:12:57.524Z","dependency_job_id":"b719897a-9de6-4b28-a19c-beafe23edf0f","html_url":"https://github.com/James-LG/Skyscraper","commit_stats":{"total_commits":50,"total_committers":2,"mean_commits":25.0,"dds":"0.040000000000000036","last_synced_commit":"3967080e3d5438bff21fb72b5a0a59fdadde9770"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/James-LG/Skyscraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/James-LG%2FSkyscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/James-LG%2FSkyscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/James-LG%2FSkyscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/James-LG%2FSkyscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/James-LG","download_url":"https://codeload.github.com/James-LG/Skyscraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/James-LG%2FSkyscraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29417706,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-13T06:24:03.484Z","status":"ssl_error","status_checked_at":"2026-02-13T06:23:12.830Z","response_time":78,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","rust","scraper","xpath"],"created_at":"2026-02-02T18:00:32.444Z","updated_at":"2026-02-13T21:01:18.235Z","avatar_url":"https://github.com/James-LG.png","language":"Rust","funding_links":[],"categories":["Libraries"],"sub_categories":["Web"],"readme":"# Skyscraper - HTML scraping with XPath\r\n\r\n[![Dependency Status](https://deps.rs/repo/github/James-LG/Skyscraper/status.svg)](https://deps.rs/repo/github/James-LG/Skyscraper)\r\n[![License MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/James-LG/Skyscraper/blob/master/LICENSE)\r\n[![Crates.io](https://img.shields.io/crates/v/skyscraper.svg)](https://crates.io/crates/skyscraper)\r\n[![doc.rs](https://docs.rs/skyscraper/badge.svg)](https://docs.rs/skyscraper)\r\n\r\nRust library to scrape HTML documents with XPath expressions.\r\n\r\n\u003e This library is major-version 0 because there are still `todo!` calls for many xpath features.\r\n\u003eIf you encounter one that you feel should be prioritized, open an issue on [GitHub](https://github.com/James-LG/Skyscraper/issues).\r\n\u003e\r\n\u003e See the [Supported XPath Features](#supported-xpath-features) section for details.\r\n\r\n## HTML Parsing\r\n\r\nSkyscraper has its own HTML parser implementation. The parser outputs a\r\ntree structure that can be traversed manually with parent/child relationships.\r\n\r\n### Example: Simple HTML Parsing\r\n\r\n```rust\r\nuse skyscraper::html::{self, parse::ParseError};\r\nlet html_text = r##\"\r\n\u003chtml\u003e\r\n    \u003cbody\u003e\r\n        \u003cdiv\u003eHello world\u003c/div\u003e\r\n    \u003c/body\u003e\r\n\u003c/html\u003e\"##;\r\n \r\nlet document = html::parse(html_text)?;\r\n```\r\n\r\n### Example: Traversing Parent/Child Relationships\r\n\r\n```rust\r\n// Parse the HTML text into a document\r\nlet text = r#\"\u003cparent\u003e\u003cchild/\u003e\u003cchild/\u003e\u003c/parent\u003e\"#;\r\nlet document = html::parse(text)?;\r\n \r\n// Get the children of the root node\r\nlet parent_node: DocumentNode = document.root_node;\r\nlet children: Vec\u003cDocumentNode\u003e = parent_node.children(\u0026document).collect();\r\nassert_eq!(2, children.len());\r\n \r\n// Get the parent of both child nodes\r\nlet parent_of_child0: DocumentNode = children[0].parent(\u0026document).expect(\"parent of child 0 missing\");\r\nlet parent_of_child1: DocumentNode = children[1].parent(\u0026document).expect(\"parent of child 1 missing\");\r\n \r\nassert_eq!(parent_node, parent_of_child0);\r\nassert_eq!(parent_node, parent_of_child1);\r\n```\r\n\r\n## XPath Expressions\r\n\r\nSkyscraper is capable of parsing XPath strings and applying them to HTML documents.\r\n\r\nBelow is a basic xpath example. Please see the [docs](https://docs.rs/skyscraper/latest/skyscraper/xpath/index.html) for more examples.\r\n\r\n```rust\r\nuse skyscraper::html;\r\nuse skyscraper::xpath::{self, XpathItemTree, grammar::{XpathItemTreeNodeData, data_model::{Node, XpathItem}}};\r\nuse std::error::Error;\r\n\r\nfn main() -\u003e Result\u003c(), Box\u003cdyn Error\u003e\u003e {\r\n    let html_text = r##\"\r\n    \u003chtml\u003e\r\n        \u003cbody\u003e\r\n            \u003cdiv\u003eHello world\u003c/div\u003e\r\n        \u003c/body\u003e\r\n    \u003c/html\u003e\"##;\r\n\r\n    let document = html::parse(html_text)?;\r\n    let xpath_item_tree = XpathItemTree::from(\u0026document);\r\n    let xpath = xpath::parse(\"//div\")?;\r\n   \r\n    let item_set = xpath.apply(\u0026xpath_item_tree)?;\r\n   \r\n    assert_eq!(item_set.len(), 1);\r\n   \r\n    let mut items = item_set.into_iter();\r\n   \r\n    let item = items\r\n        .next()\r\n        .unwrap();\r\n\r\n    let element = item\r\n        .as_node()?\r\n        .as_tree_node()?\r\n        .data\r\n        .as_element_node()?;\r\n\r\n    assert_eq!(element.name, \"div\");\r\n    Ok(())\r\n}\r\n```\r\n\r\n### Supported XPath Features\r\n\r\nBelow is a non-exhaustive list of all the features that are currently supported.\r\n\r\n1. Basic xpath steps: `/html/body/div`, `//div/table//span`\r\n1. Attribute selection: `//div/@class`\r\n1. Text selection: `//div/text()`\r\n1. Wildcard node selection: `//body/*`\r\n1. Predicates:\r\n    1. Attributes: `//div[@class='hi']`\r\n    1. Indexing: `//div[1]`\r\n1. Functions:\r\n    1. `fn:root()`\r\n    1. `contains(haystack, needle)`\r\n1. Forward axes:\r\n    1. Child: `child::*`\r\n    1. Descendant: `descendant::*`\r\n    1. Attribute: `attribute::*`\r\n    1. DescendentOrSelf: `descendant-or-self::*`\r\n    1. (more coming soon)\r\n1. Reverse axes:\r\n    1. Parent:  `parent::*`\r\n    1. (more coming soon)\r\n1. Treat expressions: `/html treat as node()`\r\n\r\nThis should cover most XPath use-cases.\r\nIf your use case requires an unimplemented feature,\r\nplease open an issue on [GitHub](https://github.com/James-LG/Skyscraper/issues).\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJames-LG%2FSkyscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJames-LG%2FSkyscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJames-LG%2FSkyscraper/lists"}