{"id":13439265,"url":"https://github.com/lifthrasiir/rust-encoding","last_synced_at":"2025-05-15T06:02:30.844Z","repository":{"id":9774125,"uuid":"11745467","full_name":"lifthrasiir/rust-encoding","owner":"lifthrasiir","description":"Character encoding support for Rust","archived":false,"fork":false,"pushed_at":"2024-03-31T17:43:21.000Z","size":4227,"stargazers_count":285,"open_issues_count":36,"forks_count":58,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-05-15T02:34:15.711Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lifthrasiir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.txt","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-07-29T17:35:09.000Z","updated_at":"2025-05-07T08:43:40.000Z","dependencies_parsed_at":"2024-06-18T15:36:54.465Z","dependency_job_id":"74d6426a-890e-4cde-904f-86715384f77d","html_url":"https://github.com/lifthrasiir/rust-encoding","commit_stats":{"total_commits":270,"total_committers":37,"mean_commits":7.297297297297297,"dds":"0.30000000000000004","last_synced_commit":"eb3d3c307df864f6a25e2ca16d49703e5d963ec5"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifthrasiir%2Frust-encoding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifthrasiir%2Frust-encoding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifthrasiir%2Frust-encoding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifthrasiir%2Frust-encoding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lifthrasiir","download_url":"https://codeload.github.com/lifthrasiir/rust-encoding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254283336,"owners_count":22045140,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:12.494Z","updated_at":"2025-05-15T06:02:30.811Z","avatar_url":"https://github.com/lifthrasiir.png","language":"Rust","readme":"[Encoding][doc] 0.3.0-dev\n=========================\n\n[![Encoding on Travis CI][travis-image]][travis]\n\n[travis-image]: https://travis-ci.org/lifthrasiir/rust-encoding.png\n[travis]: https://travis-ci.org/lifthrasiir/rust-encoding\n\nCharacter encoding support for Rust. (also known as `rust-encoding`)\nIt is based on [WHATWG Encoding Standard](http://encoding.spec.whatwg.org/),\nand also provides an advanced interface for error detection and recovery.\n\n*This documentation is for the development version (0.3).\nPlease see the [stable documentation][doc] for 0.2.x versions.*\n\n[Complete Documentation][doc] (stable)\n\n[doc]: https://lifthrasiir.github.io/rust-encoding/\n\n## Usage\n\nPut this in your `Cargo.toml`:\n\n```toml\n[dependencies]\nencoding = \"0.3\"\n```\n\nThen put this in your crate root:\n\n```rust\nextern crate encoding;\n```\n\n### Data Table\n\nBy default, Encoding comes with ~480 KB of data table (\"indices\").\nThis allows Encoding to encode and decode legacy encodings efficiently,\nbut this might not be desirable for some applications.\n\nEncoding provides the `no-optimized-legacy-encoding` Cargo feature\nto reduce the size of encoding tables (to ~185 KB)\nat the expense of encoding performance (typically 5x to 20x slower).\nThe decoding performance remains identical.\n**This feature is strongly intended for end users.\nDo not try to enable this feature from library crates, ever.**\n\nFor finer-tuned optimization, see `src/index/gen_index.py` for\ncustom table generation.\n\n## Overview\n\nTo encode a string:\n\n```rust\nuse encoding::{Encoding, EncoderTrap};\nuse encoding::all::ISO_8859_1;\n\nassert_eq!(ISO_8859_1.encode(\"caf\\u{e9}\", EncoderTrap::Strict),\n           Ok(vec![99,97,102,233]));\n```\n\nTo encode a string with unrepresentable characters:\n\n```rust\nuse encoding::{Encoding, EncoderTrap};\nuse encoding::all::ISO_8859_2;\n\nassert!(ISO_8859_2.encode(\"Acme\\u{a9}\", EncoderTrap::Strict).is_err());\nassert_eq!(ISO_8859_2.encode(\"Acme\\u{a9}\", EncoderTrap::Replace),\n           Ok(vec![65,99,109,101,63]));\nassert_eq!(ISO_8859_2.encode(\"Acme\\u{a9}\", EncoderTrap::Ignore),\n           Ok(vec![65,99,109,101]));\nassert_eq!(ISO_8859_2.encode(\"Acme\\u{a9}\", EncoderTrap::NcrEscape),\n           Ok(vec![65,99,109,101,38,35,49,54,57,59]));\n```\n\nTo decode a byte sequence:\n\n```rust\nuse encoding::{Encoding, DecoderTrap};\nuse encoding::all::ISO_8859_1;\n\nassert_eq!(ISO_8859_1.decode(\u0026[99,97,102,233], DecoderTrap::Strict),\n           Ok(\"caf\\u{e9}\".to_string()));\n```\n\nTo decode a byte sequence with invalid sequences:\n\n```rust\nuse encoding::{Encoding, DecoderTrap};\nuse encoding::all::ISO_8859_6;\n\nassert!(ISO_8859_6.decode(\u0026[65,99,109,101,169], DecoderTrap::Strict).is_err());\nassert_eq!(ISO_8859_6.decode(\u0026[65,99,109,101,169], DecoderTrap::Replace),\n           Ok(\"Acme\\u{fffd}\".to_string()));\nassert_eq!(ISO_8859_6.decode(\u0026[65,99,109,101,169], DecoderTrap::Ignore),\n           Ok(\"Acme\".to_string()));\n```\n\nTo encode or decode the input into the already allocated buffer:\n\n```rust\nuse encoding::{Encoding, EncoderTrap, DecoderTrap};\nuse encoding::all::{ISO_8859_2, ISO_8859_6};\n\nlet mut bytes = Vec::new();\nlet mut chars = String::new();\n\nassert!(ISO_8859_2.encode_to(\"Acme\\u{a9}\", EncoderTrap::Ignore, \u0026mut bytes).is_ok());\nassert!(ISO_8859_6.decode_to(\u0026[65,99,109,101,169], DecoderTrap::Replace, \u0026mut chars).is_ok());\n\nassert_eq!(bytes, [65,99,109,101]);\nassert_eq!(chars, \"Acme\\u{fffd}\");\n```\n\nA practical example of custom encoder traps:\n\n```rust\nuse encoding::{Encoding, ByteWriter, EncoderTrap, DecoderTrap};\nuse encoding::types::RawEncoder;\nuse encoding::all::ASCII;\n\n// hexadecimal numeric character reference replacement\nfn hex_ncr_escape(_encoder: \u0026mut RawEncoder, input: \u0026str, output: \u0026mut ByteWriter) -\u003e bool {\n    let escapes: Vec\u003cString\u003e =\n        input.chars().map(|ch| format!(\"\u0026#x{:x};\", ch as isize)).collect();\n    let escapes = escapes.concat();\n    output.write_bytes(escapes.as_bytes());\n    true\n}\nstatic HEX_NCR_ESCAPE: EncoderTrap = EncoderTrap::Call(hex_ncr_escape);\n\nlet orig = \"Hello, 世界!\".to_string();\nlet encoded = ASCII.encode(\u0026orig, HEX_NCR_ESCAPE).unwrap();\nassert_eq!(ASCII.decode(\u0026encoded, DecoderTrap::Strict),\n           Ok(\"Hello, \u0026#x4e16;\u0026#x754c;!\".to_string()));\n```\n\nGetting the encoding from the string label, as specified in WHATWG Encoding standard:\n\n```rust\nuse encoding::{Encoding, DecoderTrap};\nuse encoding::label::encoding_from_whatwg_label;\nuse encoding::all::WINDOWS_949;\n\nlet euckr = encoding_from_whatwg_label(\"euc-kr\").unwrap();\nassert_eq!(euckr.name(), \"windows-949\");\nassert_eq!(euckr.whatwg_name(), Some(\"euc-kr\")); // for the sake of compatibility\nlet broken = \u0026[0xbf, 0xec, 0xbf, 0xcd, 0xff, 0xbe, 0xd3];\nassert_eq!(euckr.decode(broken, DecoderTrap::Replace),\n           Ok(\"\\u{c6b0}\\u{c640}\\u{fffd}\\u{c559}\".to_string()));\n\n// corresponding Encoding native API:\nassert_eq!(WINDOWS_949.decode(broken, DecoderTrap::Replace),\n           Ok(\"\\u{c6b0}\\u{c640}\\u{fffd}\\u{c559}\".to_string()));\n```\n\n## Types and Stuffs\n\nThere are three main entry points to Encoding.\n\n**`Encoding`** is a single character encoding.\nIt contains `encode` and `decode` methods for converting `String` to `Vec\u003cu8\u003e` and vice versa.\nFor the error handling, they receive **traps** (`EncoderTrap` and `DecoderTrap` respectively)\nwhich replace any error with some string (e.g. `U+FFFD`) or sequence (e.g. `?`).\nYou can also use `EncoderTrap::Strict` and `DecoderTrap::Strict` traps to stop on an error.\n\nThere are two ways to get `Encoding`:\n\n* `encoding::all` has static items for every supported encoding.\n  You should use them when the encoding would not change or only handful of them are required.\n  Combined with link-time optimization, any unused encoding would be discarded from the binary.\n* `encoding::label` has functions to dynamically get an encoding from given string (\"label\").\n  They will return a static reference to the encoding,\n  which type is also known as `EncodingRef`.\n  It is useful when a list of required encodings is not available in advance,\n  but it will result in the larger binary and missed optimization opportunities.\n\n**`RawEncoder`** is an experimental incremental encoder.\nAt each step of `raw_feed`, it receives a slice of string\nand emits any encoded bytes to a generic `ByteWriter` (normally `Vec\u003cu8\u003e`).\nIt will stop at the first error if any, and would return a `CodecError` struct in that case.\nThe caller is responsible for calling `raw_finish` at the end of encoding process.\n\n**`RawDecoder`** is an experimental incremental decoder.\nAt each step of `raw_feed`, it receives a slice of byte sequence\nand emits any decoded characters to a generic `StringWriter` (normally `String`).\nOtherwise it is identical to `RawEncoder`s.\n\nOne should prefer `Encoding::{encode,decode}` as a primary interface.\n`RawEncoder` and `RawDecoder` is experimental and can change substantially.\nSee the additional documents on `encoding::types` module for more information on them.\n\n## Supported Encodings\n\nEncoding covers all encodings specified by WHATWG Encoding Standard and some more:\n\n* 7-bit strict ASCII (`ascii`)\n* UTF-8 (`utf-8`)\n* UTF-16 in little endian (`utf-16` or `utf-16le`) and big endian (`utf-16be`)\n* All single byte encoding in WHATWG Encoding Standard:\n    * IBM code page 866\n    * ISO 8859-{2,3,4,5,6,7,8,10,13,14,15,16}\n    * KOI8-R, KOI8-U\n    * MacRoman (`macintosh`), Macintosh Cyrillic encoding (`x-mac-cyrillic`)\n    * Windows code pages 874, 1250, 1251, 1252 (instead of ISO 8859-1), 1253,\n      1254 (instead of ISO 8859-9), 1255, 1256, 1257, 1258\n* All multi byte encodings in WHATWG Encoding Standard:\n    * Windows code page 949 (`euc-kr`, since the strict EUC-KR is hardly used)\n    * EUC-JP and Windows code page 932 (`shift_jis`,\n      since it's the most widespread extension to Shift_JIS)\n    * ISO-2022-JP with asymmetric JIS X 0212 support\n      (Note: this is not yet up to date to the current standard)\n    * GBK\n    * GB 18030\n    * Big5-2003 with HKSCS-2008 extensions\n* Encodings that were originally specified by WHATWG Encoding Standard:\n    * HZ\n* ISO 8859-1 (distinct from Windows code page 1252)\n\nParenthesized names refer to the encoding's primary name assigned by WHATWG Encoding Standard.\n\nMany legacy character encodings lack the proper specification,\nand even those that have a specification are highly dependent of the actual implementation.\nConsequently one should be careful when picking a desired character encoding.\nThe only standards reliable in this regard are WHATWG Encoding Standard and\n[vendor-provided mappings from the Unicode consortium](http://www.unicode.org/Public/MAPPINGS/).\nWhenever in doubt, look at the source code and specifications for detailed explanations.\n\n","funding_links":[],"categories":["Libraries","代码","库 Libraries","Rust","库"],"sub_categories":["Encoding","编码","编码 Encoding","编码(Encoding)","加密 Encoding"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifthrasiir%2Frust-encoding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flifthrasiir%2Frust-encoding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifthrasiir%2Frust-encoding/lists"}