{"id":19276496,"url":"https://github.com/orottier/webpage-rs","last_synced_at":"2025-04-07T17:09:13.327Z","repository":{"id":40267644,"uuid":"138497445","full_name":"orottier/webpage-rs","owner":"orottier","description":"Small Rust library to fetch info about a web page: title, description, language, HTTP info, RSS feeds, Opengraph, Schema.org, and more","archived":false,"fork":false,"pushed_at":"2024-09-16T20:37:27.000Z","size":64,"stargazers_count":57,"open_issues_count":4,"forks_count":11,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-31T13:19:15.127Z","etag":null,"topics":["html","html-parser","json-ld","opengraph","rust"],"latest_commit_sha":null,"homepage":"https://docs.rs/webpage","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orottier.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-24T16:24:59.000Z","updated_at":"2025-03-23T22:40:01.000Z","dependencies_parsed_at":"2023-02-06T08:46:40.968Z","dependency_job_id":"ac74271c-7e70-493d-af75-4eac9a3e1951","html_url":"https://github.com/orottier/webpage-rs","commit_stats":{"total_commits":49,"total_committers":8,"mean_commits":6.125,"dds":"0.36734693877551017","last_synced_commit":"a73992673f44d3dbb18abcccf8f3a6454ffb95e8"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orottier%2Fwebpage-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orottier%2Fwebpage-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orottier%2Fwebpage-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orottier%2Fwebpage-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orottier","download_url":"https://codeload.github.com/orottier/webpage-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247694876,"owners_count":20980733,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","html-parser","json-ld","opengraph","rust"],"created_at":"2024-11-09T20:54:43.465Z","updated_at":"2025-04-07T17:09:13.297Z","avatar_url":"https://github.com/orottier.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# Webpage.rs\n\n[![crates.io](https://img.shields.io/crates/v/webpage.svg)](https://crates.io/crates/webpage)\n[![docs.rs](https://img.shields.io/docsrs/webpage)](https://docs.rs/webpage)\n\n_Small library to fetch info about a web page: title, description, language,\nHTTP info, links, RSS feeds, Opengraph, Schema.org, and more_\n\n## Usage\n\n```rust\nuse webpage::{Webpage, WebpageOptions};\n\nlet info = Webpage::from_url(\"http://www.rust-lang.org/en-US/\", WebpageOptions::default())\n    .expect(\"Could not read from URL\");\n\n// the HTTP transfer info\nlet http = info.http;\n\nassert_eq!(http.ip, \"54.192.129.71\".to_string());\nassert!(http.headers[0].starts_with(\"HTTP\"));\nassert!(http.body.starts_with(\"\u003c!DOCTYPE html\u003e\"));\nassert_eq!(http.url, \"https://www.rust-lang.org/en-US/\".to_string()); // followed redirects (HTTPS)\nassert_eq!(http.content_type, \"text/html\".to_string());\n\n// the parsed HTML info\nlet html = info.html;\n\nassert_eq!(html.title, Some(\"The Rust Programming Language\".to_string()));\nassert_eq!(html.description, Some(\"A systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.\".to_string()));\nassert_eq!(html.opengraph.og_type, \"website\".to_string());\n```\n\nYou can also get HTML info about local data:\n\n```rust\nuse webpage::HTML;\nlet html = HTML::from_file(\"index.html\", None);\n// or let html = HTML::from_string(input, None);\n```\n\n## Features\n\n### Serialization\n\nIf you need to be able to serialize the data provided by the library using\n[serde](https://serde.rs/), you can include specify the `serde` *feature* while\ndeclaring your dependencies in `Cargo.toml`:\n\n```toml\nwebpage = { version = \"2.0\", features = [\"serde\"] }\n```\n\n### No curl dependency\n\nThe `curl` feature is enabled by default but is optional. This is useful if you\ndo not need a HTTP client but already have the HTML data at hand.\n\n## All fields\n\n```rust\npub struct Webpage {\n    pub http: HTTP, // info about the HTTP transfer\n    pub html: HTML, // info from the parsed HTML doc\n}\n\npub struct HTTP {\n    pub ip: String,\n    pub transfer_time: Duration,\n    pub redirect_count: u32,\n    pub content_type: String,\n    pub response_code: u32,\n    pub headers: Vec\u003cString\u003e, // raw headers from final request\n    pub url: String, // effective url\n    pub body: String,\n}\n\npub struct HTML {\n    pub title: Option\u003cString\u003e,\n    pub description: Option\u003cString\u003e,\n\n    pub url: Option\u003cString\u003e, // canonical url\n    pub feed: Option\u003cString\u003e, // RSS feed typically\n\n    pub language: Option\u003cString\u003e, // as specified, not detected\n    pub text_content: String, // all tags stripped from body\n    pub links: Vec\u003cLink\u003e, // all links in the document\n\n    pub meta: HashMap\u003cString, String\u003e, // flattened down list of meta properties\n\n    pub opengraph: Opengraph,\n    pub schema_org: Vec\u003cSchemaOrg\u003e,\n}\n\npub struct Link {\n    pub url: String, // resolved url of the link\n    pub text: String, // anchor text\n}\n\npub struct Opengraph {\n    pub og_type: String,\n    pub properties: HashMap\u003cString, String\u003e,\n\n    pub images: Vec\u003cObject\u003e,\n    pub videos: Vec\u003cObject\u003e,\n    pub audios: Vec\u003cObject\u003e,\n}\n\n// Facebook's Opengraph structured data\npub struct OpengraphObject {\n    pub url: String,\n    pub properties: HashMap\u003cString, String\u003e,\n}\n\n// Google's schema.org structured data\npub struct SchemaOrg {\n    pub schema_type: String,\n    pub value: serde_json::Value,\n}\n```\n\n## Options\n\nThe following HTTP configurations are available:\n\n```rust\npub struct WebpageOptions {\n    allow_insecure: false,\n    follow_location: true,\n    max_redirections: 5,\n    timeout: Duration::from_secs(10),\n    useragent: \"Webpage - Rust crate - https://crates.io/crates/webpage\".to_string(),\n    headers: vec![\"X-My-Header: 1234\".to_string()],\n}\n\n// usage\nlet mut options = WebpageOptions::default();\noptions.allow_insecure = true;\nlet info = Webpage::from_url(\u0026url, options).expect(\"Halp, could not fetch\");\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forottier%2Fwebpage-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forottier%2Fwebpage-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forottier%2Fwebpage-rs/lists"}