{"id":15688167,"url":"https://github.com/ticky/webarchive","last_synced_at":"2026-02-17T12:32:03.458Z","repository":{"id":43268054,"uuid":"268147349","full_name":"ticky/webarchive","owner":"ticky","description":"📑 Rust utilities for working with Apple's Web Archive file format","archived":false,"fork":false,"pushed_at":"2022-03-11T05:56:00.000Z","size":518,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2025-08-21T04:32:28.045Z","etag":null,"topics":["rust-crate","rust-lang","safari","webarchive"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ticky.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"License-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-30T19:36:00.000Z","updated_at":"2024-10-25T18:51:45.000Z","dependencies_parsed_at":"2022-08-30T20:01:22.190Z","dependency_job_id":null,"html_url":"https://github.com/ticky/webarchive","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ticky/webarchive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ticky%2Fwebarchive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ticky%2Fwebarchive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ticky%2Fwebarchive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ticky%2Fwebarchive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ticky","download_url":"https://codeload.github.com/ticky/webarchive/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ticky%2Fwebarchive/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29543905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T12:21:34.159Z","status":"ssl_error","status_checked_at":"2026-02-17T12:21:02.057Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["rust-crate","rust-lang","safari","webarchive"],"created_at":"2024-10-03T17:55:54.547Z","updated_at":"2026-02-17T12:32:03.437Z","avatar_url":"https://github.com/ticky.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# webarchive\n\n[![crates.io](https://img.shields.io/crates/v/webarchive.svg)](https://crates.io/crates/webarchive) [![docs.rs](https://img.shields.io/docsrs/webarchive)](https://docs.rs/webarchive/) [![Rust](https://github.com/ticky/webarchive/actions/workflows/rust.yml/badge.svg)](https://github.com/ticky/webarchive/actions/workflows/rust.yml)\n\nRust utilities for working with Apple's Web Archive file format,\nas produced by Safari 2 or later on macOS, Safari 4 or later on Windows,\nor Safari 13 or later on iOS and iPadOS.\n\n## Why Web Archive?\n\nWeb Archive files have been around since 2005, and are a way to save an\nentire web page, and all associated resources involved in displaying it,\nas a single file which can be saved to disk, reviewed or shared regardless\nof changes, removals or the state of the server which originally served it.\n\nWhile not well supported outside of Apple platforms, and not supported by\niOS until iOS 13 in 2019, the Web Archive is one of few formats designed\nfor a user to simply open a page and expect it to work as the original did.\nOne which came closest is [MHTML](https://en.wikipedia.org/wiki/MHTML),\nsupported in older versions of Microsoft's Internet Explorer and with a\nsimilar approach to Web Archive, representing a web page in its entirety.\n\nAlternatives aimed at professional or semi-professional archives work, such\nas [WARC](https://en.wikipedia.org/wiki/Web_ARChive) instead represent an\nentire browsing session and associated subresources, but require\nspecialised software to view, and do not have the concept of a \"main\" page\nor resource. Web Archives, by contrast, open in a normal web browser and\ndo not require the user to know which URL to select.\n\n## Okay, so what's the goal?\n\nI aim for this to be an ergonomic API for reading, creating, and converting\nWeb Archive files, and to expand the included command line utility to allow\nbi-directional conversion between common formats and Web Archives.\n\n## Usage\n\n### Command-line usage\n\nA command-line utility is provided, which can be installed by running:\n\n```shell\ncargo install webarchive\n```\n\nThis utility can extract or inspect the contents of webarchive files.\n\nList the contents with `inspect`:\n\n```shell\n$ webarchive inspect fixtures/psxdatacenter.webarchive\nWebArchive of \"http://psxdatacenter.com/ntsc-j_list.html\": 0 subresource(s)\nWebArchive of \"http://psxdatacenter.com/banner.html\": 2 subresource(s)\n  - \"http://psxdatacenter.com/images/texgrey.jpg\"\n  - \"http://psxdatacenter.com/images/logo.jpg\"\nWebArchive of \"http://psxdatacenter.com/nav.html\": 16 subresource(s)\n  - \"http://psxdatacenter.com/images/texgrey.jpg\"\n  - \"http://psxdatacenter.com/buttons/news1.gif\"\n  - \"http://psxdatacenter.com/buttons/inf1.gif\"\n  - \"http://psxdatacenter.com/buttons/emul1.gif\"\n...\n```\n\nOr extract them to disk with `extract`:\n\n```shell\n$ webarchive extract fixtures/psxdatacenter.webarchive\nSaving main resource...\nWriting file \"fixtures/psxdatacenter.com/ntsc-j_list.html\"...\nSaving subframe archives...\nSaving main resource...\nWriting file \"fixtures/psxdatacenter.com/banner.html\"...\nSaving subresources...\nWriting file \"fixtures/psxdatacenter.com/images/texgrey.jpg\"...\nWriting file \"fixtures/psxdatacenter.com/images/logo.jpg\"...\n...\n```\n\n### Reading a webarchive\n\n```rust\nuse webarchive::WebArchive;\n\nlet archive: WebArchive = webarchive::from_file(\"fixtures/psxdatacenter.webarchive\")?;\n\n/// main_resource is the resource which is opened by default\nassert_eq!(\n    archive.main_resource.url,\n    \"http://psxdatacenter.com/ntsc-j_list.html\"\n);\nassert_eq!(archive.main_resource.mime_type, \"text/html\");\nassert_eq!(\n    archive.main_resource.text_encoding_name,\n    Some(\"UTF-8\".to_string())\n);\nassert_eq!(archive.main_resource.data.len(), 2171);\nassert!(archive.subresources.is_none());\n\n/// subframe_archives contains additional WebArchives for frames\nassert!(archive.subframe_archives.is_some());\nlet subframe_archives = archive.subframe_archives.unwrap();\nassert_eq!(subframe_archives.len(), 4);\n\nassert_eq!(\n    subframe_archives[0].main_resource.url,\n    \"http://psxdatacenter.com/banner.html\"\n);\nassert_eq!(subframe_archives[0].main_resource.mime_type, \"text/html\");\nassert_eq!(\n    subframe_archives[0].main_resource.text_encoding_name,\n    Some(\"UTF-8\".to_string())\n);\nassert_eq!(subframe_archives[0].main_resource.data.len(), 782);\n\n/// subresources are the files referenced by a given frame\nassert!(subframe_archives[0].subresources.is_some());\nlet subresources = subframe_archives[0].subresources.as_ref().unwrap();\nassert_eq!(subresources.len(), 2);\n\nassert_eq!(\n    subresources[0].url,\n    \"http://psxdatacenter.com/images/texgrey.jpg\"\n);\nassert_eq!(subresources[0].mime_type, \"image/jpeg\");\nassert!(subresources[0].text_encoding_name.is_none());\nassert_eq!(subresources[0].data.len(), 107128);\n```\n\n### Creating a webarchive\n\n```rust\nuse webarchive::{WebArchive, WebResource};\n\nlet resource = WebResource {\n    url: \"about:hello\".to_string(),\n    data: \"hello world\".as_bytes().to_vec(),\n    mime_type: \"text/plain\".to_string(),\n    text_encoding_name: Some(\"utf-8\".to_string()),\n    frame_name: None,\n    response: None,\n};\n\nlet archive = WebArchive {\n    main_resource: resource,\n    subresources: None,\n    subframe_archives: None,\n};\n\nlet mut buf: Vec\u003cu8\u003e = Vec::new();\n\nwebarchive::to_writer_xml(\u0026mut buf, \u0026archive)?;\n\nassert_eq!(\n    String::from_utf8(buf)?,\n    r#\"\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003c!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\"\u003e\n\u003cplist version=\"1.0\"\u003e\n\u003cdict\u003e\n\t\u003ckey\u003eWebMainResource\u003c/key\u003e\n\t\u003cdict\u003e\n\t\t\u003ckey\u003eWebResourceData\u003c/key\u003e\n\t\t\u003cdata\u003e\n\t\taGVsbG8gd29ybGQ=\n\t\t\u003c/data\u003e\n\t\t\u003ckey\u003eWebResourceURL\u003c/key\u003e\n\t\t\u003cstring\u003eabout:hello\u003c/string\u003e\n\t\t\u003ckey\u003eWebResourceMIMEType\u003c/key\u003e\n\t\t\u003cstring\u003etext/plain\u003c/string\u003e\n\t\t\u003ckey\u003eWebResourceTextEncodingName\u003c/key\u003e\n\t\t\u003cstring\u003eutf-8\u003c/string\u003e\n\t\u003c/dict\u003e\n\u003c/dict\u003e\n\u003c/plist\u003e\"#\n);\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fticky%2Fwebarchive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fticky%2Fwebarchive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fticky%2Fwebarchive/lists"}