{"id":13623805,"url":"https://github.com/nicolas-grekas/Patchwork-UTF8","last_synced_at":"2025-04-15T20:32:29.333Z","repository":{"id":23004439,"uuid":"26355127","full_name":"nicolas-grekas/Patchwork-UTF8","owner":"nicolas-grekas","description":"Extensive, portable and performant handling of UTF-8 and grapheme clusters for PHP","archived":false,"fork":true,"pushed_at":"2022-07-12T11:49:23.000Z","size":5157,"stargazers_count":79,"open_issues_count":1,"forks_count":12,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-08-01T21:57:45.094Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"tchwork/utf8","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicolas-grekas.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-11-08T08:56:07.000Z","updated_at":"2024-08-01T21:57:45.094Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nicolas-grekas/Patchwork-UTF8","commit_stats":null,"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolas-grekas%2FPatchwork-UTF8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolas-grekas%2FPatchwork-UTF8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolas-grekas%2FPatchwork-UTF8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolas-grekas%2FPatchwork-UTF8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicolas-grekas","download_url":"https://codeload.github.com/nicolas-grekas/Patchwork-UTF8/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223684806,"owners_count":17185714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T21:01:35.875Z","updated_at":"2024-11-08T12:30:21.860Z","avatar_url":"https://github.com/nicolas-grekas.png","language":"PHP","readme":"Patchwork UTF-8 for PHP\n=======================\n\n[![Latest Stable Version](https://poser.pugx.org/patchwork/utf8/v/stable.png)](https://packagist.org/packages/patchwork/utf8)\n[![Total Downloads](https://poser.pugx.org/patchwork/utf8/downloads.png)](https://packagist.org/packages/patchwork/utf8)\n[![Build Status](https://secure.travis-ci.org/tchwork/utf8.png?branch=master)](http://travis-ci.org/tchwork/utf8)\n[![SensioLabsInsight](https://insight.sensiolabs.com/projects/666c8ae7-0997-4d27-883a-6089ce3cc76b/mini.png)](https://insight.sensiolabs.com/projects/666c8ae7-0997-4d27-883a-6089ce3cc76b)\n\nPatchwork UTF-8 gives PHP developpers extensive, portable and performant\nhandling of UTF-8 and [grapheme clusters](http://unicode.org/reports/tr29/).\n\nIt provides both :\n\n- a portability layer for `mbstring`, `iconv`, and intl `Normalizer` and\n  `grapheme_*` functions,\n- an UTF-8 grapheme clusters aware replica of native string functions.\n\nIt can also serve as a documentation source referencing the practical problems\nthat arise when handling UTF-8 in PHP: Unicode concepts, related algorithms,\nbugs in PHP core, workarounds, etc.\n\nVersion 1.2 adds best-fit mappings for UTF-8 to *Code Page* approximations.\nIt also adds Unicode filesystem access under Windows, using preferably\n[wfio](https://github.com/kenjiuno/php-wfio) or a COM based fallback otherwise.\n\nPortability\n-----------\n\nUnicode handling in PHP is best performed using a combo of `mbstring`, `iconv`,\n`intl` and `pcre` with the `u` flag enabled. But when an application is expected\nto run on many servers, you should be aware that these 4 extensions are not\nalways enabled.\n\nPatchwork UTF-8 provides pure PHP implementations for 3 of those 4 extensions.\n`pcre` compiled with unicode support is required but is widely available.\nThe following set of portability-fallbacks allows an application to run on a\nserver even if one or more of those extensions are not enabled:\n\n- *utf8_encode, utf8_decode*,\n- `mbstring`: *mb_check_encoding, mb_convert_case, mb_convert_encoding,\n  mb_decode_mimeheader, mb_detect_encoding, mb_detect_order,\n  mb_encode_mimeheader, mb_encoding_aliases, mb_get_info, mb_http_input,\n  mb_http_output, mb_internal_encoding, mb_language, mb_list_encodings,\n  mb_output_handler, mb_strlen, mb_strpos, mb_strrpos, mb_strtolower,\n  mb_strtoupper, mb_stripos, mb_stristr, mb_strrchr, mb_strrichr, mb_strripos,\n  mb_strstr, mb_strwidth, mb_substitute_character, mb_substr, mb_substr_count*,\n- `iconv`: *iconv, iconv_mime_decode, iconv_mime_decode_headers,\n  iconv_get_encoding, iconv_set_encoding, iconv_mime_encode, ob_iconv_handler,\n  iconv_strlen, iconv_strpos, iconv_strrpos, iconv_substr*,\n- `intl`: *Normalizer, grapheme_extract, grapheme_stripos, grapheme_stristr,\n  grapheme_strlen, grapheme_strpos, grapheme_strripos, grapheme_strrpos,\n  grapheme_strstr, grapheme_substr, normalizer_is_normalized,\n  normalizer_normalize*.\n\nPatchwork\\Utf8\n--------------\n\n[Grapheme clusters](http://unicode.org/reports/tr29/) should always be\nconsidered when working with generic Unicode strings. The `Patchwork\\Utf8`\nclass implements the quasi-complete set of native string functions that need\nUTF-8 grapheme clusters awareness. Function names, arguments and behavior\ncarefully replicates native PHP string functions.\n\nSome more functions are also provided to help handling UTF-8 strings:\n\n- *filter()*: normalizes to UTF-8 NFC, converting from [CP-1252](http://wikipedia.org/wiki/CP-1252) when needed,\n- *isUtf8()*: checks if a string contains well formed UTF-8 data,\n- *toAscii()*: generic UTF-8 to ASCII transliteration,\n- *strtocasefold()*: unicode transformation for caseless matching,\n- *strtonatfold()*: generic case sensitive transformation for collation matching,\n- *strwidth()*: computes the width of a string when printed on a terminal,\n- *wrapPath()*: unicode filesystem access under Windows and other OSes.\n\nMirrored string functions are:\n*strlen, substr, strpos, stripos, strrpos, strripos, strstr, stristr, strrchr,\nstrrichr, strtolower, strtoupper, wordwrap, chr, count_chars, ltrim, ord, rtrim,\ntrim, str_ireplace, str_pad, str_shuffle, str_split, str_word_count, strcmp,\nstrnatcmp, strcasecmp, strnatcasecmp, strncasecmp, strncmp, strcspn, strpbrk,\nstrrev, strspn, strtr, substr_compare, substr_count, substr_replace, ucfirst,\nlcfirst, ucwords, number_format, utf8_encode, utf8_decode, json_decode,\nfilter_input, filter_input_array*.\n\nNotably missing (but hard to replicate) are *printf*-family functions.\n\nThe implementation favors performance over full edge cases handling.\nIt generally works on UTF-8 normalized strings and provides filters to get them.\n\nAs the turkish locale requires special cares, a `Patchwork\\TurkishUtf8` class\nis provided for working with this locale. It clones all the features of\n`Patchwork\\Utf8` but knows about the turkish specifics.\n\nUsage\n-----\n\nThe recommended way to install Patchwork UTF-8 is [through\ncomposer](http://getcomposer.org). Just create a `composer.json` file and run\nthe `php composer.phar install` command to install it:\n\n    {\n        \"require\": {\n            \"patchwork/utf8\": \"~1.2\"\n        }\n    }\n\nThen, early in your bootstrap sequence, you have to configure your environment:\n\n```php\n\\Patchwork\\Utf8\\Bootup::initAll(); // Enables the portablity layer and configures PHP for UTF-8\n\\Patchwork\\Utf8\\Bootup::filterRequestUri(); // Redirects to an UTF-8 encoded URL if it's not already the case\n\\Patchwork\\Utf8\\Bootup::filterRequestInputs(); // Normalizes HTTP inputs to UTF-8 NFC\n```\n\nRun `phpunit` to see the code in action.\n\nMake sure that you are confident about using UTF-8 by reading\n[Character Sets / Character Encoding Issues](http://www.phpwact.org/php/i18n/charsets)\nand [Handling UTF-8 with PHP](http://www.phpwact.org/php/i18n/utf-8),\nor [PHP et UTF-8](http://julp.lescigales.org/articles/3-php-et-utf-8.html) for french readers.\n\nYou should also get familiar with the concept of\n[Unicode Normalization](http://en.wikipedia.org/wiki/Unicode_equivalence) and\n[Grapheme Clusters](http://unicode.org/reports/tr29/).\n\nDo not blindly replace all use of PHP's string functions. Most of the time you\nwill not need to, and you will be introducing a significant performance overhead\nto your application.\n\nScreen your input on the *outer perimeter* so that only well formed UTF-8 pass\nthrough. When dealing with badly formed UTF-8, you should not try to fix it\n(see [Unicode Security Considerations](http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters)).\nInstead, consider it as [CP-1252](http://wikipedia.org/wiki/CP-1252) and use\n`Patchwork\\Utf8::utf8_encode()` to get an UTF-8 string. Don't forget also to\nchoose one unicode normalization form and stick to it. NFC is now the defacto\nstandard. `Patchwork\\Utf8::filter()` implements this behavior: it converts from\nCP1252 and to NFC.\n\nThis library is orthogonal to `mbstring.func_overload` and will not work if the\nphp.ini setting is enabled.\n\nLicensing\n---------\n\nPatchwork\\Utf8 is free software; you can redistribute it and/or modify it under\nthe terms of the (at your option):\n- [Apache License v2.0](http://apache.org/licenses/LICENSE-2.0.txt), or\n- [GNU General Public License v2.0](http://gnu.org/licenses/gpl-2.0.txt).\n\nUnicode handling requires tedious work to be implemented and maintained on the\nlong run. As such, contributions such as unit tests, bug reports, comments or\npatches licensed under both licenses are really welcomed.\n\nI hope many projects could adopt this code and together help solve the unicode\nsubject for PHP.\n","funding_links":[],"categories":["字符串","Table of Contents","目录","PHP","Text and Numbers","字符串( Strings )","Strings","字符串 Strings"],"sub_categories":["Strings","字符串 Strings"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicolas-grekas%2FPatchwork-UTF8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicolas-grekas%2FPatchwork-UTF8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicolas-grekas%2FPatchwork-UTF8/lists"}