{"id":16266960,"url":"https://github.com/wezm/xhtmlchardet","last_synced_at":"2026-03-07T04:02:57.661Z","repository":{"id":33022002,"uuid":"36656802","full_name":"wezm/xhtmlchardet","owner":"wezm","description":"Encoding detection for XML and HTML in Rust","archived":false,"fork":false,"pushed_at":"2022-01-26T00:09:00.000Z","size":332,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-06T01:43:59.959Z","etag":null,"topics":["character-set","detection","html","rust","xml"],"latest_commit_sha":null,"homepage":"http://docs.rs/xhtmlchardet/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wezm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-01T11:36:14.000Z","updated_at":"2023-09-08T16:58:13.000Z","dependencies_parsed_at":"2022-08-31T05:40:11.814Z","dependency_job_id":null,"html_url":"https://github.com/wezm/xhtmlchardet","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/wezm/xhtmlchardet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wezm%2Fxhtmlchardet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wezm%2Fxhtmlchardet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wezm%2Fxhtmlchardet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wezm%2Fxhtmlchardet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wezm","download_url":"https://codeload.github.com/wezm/xhtmlchardet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wezm%2Fxhtmlchardet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30207393,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T03:24:23.086Z","status":"ssl_error","status_checked_at":"2026-03-07T03:23:11.444Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["character-set","detection","html","rust","xml"],"created_at":"2024-10-10T17:43:30.928Z","updated_at":"2026-03-07T04:02:52.649Z","avatar_url":"https://github.com/wezm.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xhtmlchardet\n\nBasic character set detection for XML and HTML in Rust.\n\n[![Build Status](https://api.cirrus-ci.com/github/wezm/xhtmlchardet.svg)](https://cirrus-ci.com/github/wezm/xhtmlchardet)\n[![Documentation](https://docs.rs/xhtmlchardet/badge.svg)](https://docs.rs/xhtmlchardet)\n[![Latest Version](https://img.shields.io/crates/v/xhtmlchardet.svg)](https://crates.io/crates/xhtmlchardet)\n\n**Minimum Supported Rust Version:** 1.24.0\n\n## Example\n\n```rust\nuse std::io::Cursor;\nextern crate xhtmlchardet;\n\nlet text = b\"\u003c?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1\\\"?\u003e\u003cchannel\u003e\u003ctitle\u003eExample\u003c/title\u003e\u003c/channel\u003e\";\nlet mut text_cursor = Cursor::new(text.to_vec());\nlet detected_charsets: Vec\u003cString\u003e = xhtmlchardet::detect(\u0026mut text_cursor, None).unwrap();\nassert_eq!(detected_charsets, vec![\"iso-8859-1\".to_string()]);\n```\n\n## Rationale\n\nI wrote a feed crawler that needed to determine the character set of fetched\ncontent so that it could be normalised to UTF-8. Initially I used the\n[uchardet] crate but I encountered some situations where it misdetected the\ncharset. I collected all these edge cases together and built a test suite. Then\nI implemented this crate, which passes all of those tests. It uses a fairly\nnaïve approach derived from [section F of the XML specification][xmlspec].\n\n[uchardet]: https://crates.io/crates/uchardet\n[xmlspec]: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwezm%2Fxhtmlchardet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwezm%2Fxhtmlchardet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwezm%2Fxhtmlchardet/lists"}