{"id":15664848,"url":"https://github.com/facelessuser/uniprops","last_synced_at":"2025-08-01T05:14:39.204Z","repository":{"id":54961029,"uuid":"330021859","full_name":"facelessuser/uniprops","owner":"facelessuser","description":"Provide Unicode character group strings for regular expressions","archived":false,"fork":false,"pushed_at":"2021-01-19T14:49:15.000Z","size":3988,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-24T21:46:40.936Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facelessuser.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"facelessuser","custom":["https://www.paypal.me/facelessuser"]}},"created_at":"2021-01-15T21:03:20.000Z","updated_at":"2024-12-19T05:24:57.000Z","dependencies_parsed_at":"2022-08-14T07:30:54.671Z","dependency_job_id":null,"html_url":"https://github.com/facelessuser/uniprops","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/facelessuser/uniprops","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facelessuser%2Funiprops","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facelessuser%2Funiprops/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facelessuser%2Funiprops/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facelessuser%2Funiprops/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facelessuser","download_url":"https://codeload.github.com/facelessuser/uniprops/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facelessuser%2Funiprops/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268171959,"owners_count":24207437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T13:44:21.173Z","updated_at":"2025-08-01T05:14:39.182Z","avatar_url":"https://github.com/facelessuser.png","language":"Python","funding_links":["https://github.com/sponsors/facelessuser","https://www.paypal.me/facelessuser"],"categories":[],"sub_categories":[],"readme":"[![Donate via PayPal][donate-image]][donate-link]\n[![Discord][discord-image]][discord-link]\n[![Build][github-ci-image]][github-ci-link]\n[![Coverage Status][codecov-image]][codecov-link]\n[![PyPI Version][pypi-image]][pypi-link]\n[![PyPI - Python Version][python-image]][pypi-link]\n![License][license-image-mit]\n\n# UniProps\n\nThe main purpose of this library is simply to provide Unicode Property strings for regular expression character groups.\nThis is done simply by specifying the Unicode Property and parameters (if relevant). The strings are formatted in such\na way so that they can easily be inserted into a regular expression character group with appropriate characters escaped.\n\nUniProps was originally written for and shipped as part of `backrefs`, a wrapper around regular expressions that can\nprovide features such as Unicode Properties in Python's Re. The logic was broken out into this separate package as we\ndeveloped a second package (`wcmatch`: an alternative `fnmatch` and `glob` library) which also required the ability to\nretrieve POSIX regular expression strings for certain character groups. This library feeds both of the aforementioned\npackages with the appropriate Unicode Properties.\n\nWhile UniProps is specifically designed for the previously mentioned packages, others are free to use it if they find it\nhelpful, but it is very opinionated and tailored specifically to work with how we are using it in `wcmatch` and\n`backrefs`.\n\n# API\n\nIt is important to note that the library is built for how it is used. Often the input mirrors how properties are\nspecified in regex, Unicode Properties.\n\nNames of properties and values must be lowercase with spaces, underscores, and hyphens stripped out. For instance, the\n`General Category` property name is fed in as `generalcategory` (or the alias `gc`). The library doesn't currently\nprocess and format the inputs for the user in this manner, so the user must do this before passing them in.\n\nRegardless of whether the strings are needed for a byte string or Unicode string, the properties are always returned as\nUnicode. This is because the libraries it was designed for are assembling the regular expressions together as Unicode\nstrings and then converting them to byte strings, when needed, by encoding them using `latin-1`.\n\n## Get POSIX property\n\n```py\ndef get_posix_property(value, mode=POSIX):\n    \"\"\"Retrieve the posix category.\"\"\"\n```\n\nThis retrieves a property by its POSIX name. This does not use the locale and instead either returns the property in the\nC locale or as a Unicode property only. Valid properties are: `alnum`, `alpha`, `ascii`, `blank`, `cntrl`, `digit`,\n`graph`, `lower`, `print`, `punct`, `space`, `upper`, and `xdigit`.\n\nA property can be inverted by adding the prefix `^`, in which case it will return all characters not within the\nspecified property.\n\nAvailable modes are:\n\n1. `POSIX`: acquire the POSIX property returning characters based on the POSIX definition (which are ASCII characters)\n   specified within the Unicode range. For example, `alnum` is defined as `[a-zA-Z0-9]`. Because this returns strings in\n   the Unicode range, the inverse of `alnum` `(^alnum)` would extend all the way to `\\U0010ffff`.\n\n    ```py\n    \u003e\u003e\u003e uniprops.get_posix_property(\"alnum\", mode=uniprops.POSIX)\n    '0-9A-Za-z'\n    \u003e\u003e\u003e uniprops.get_posix_property(\"^alnum\", mode=uniprops.POSIX)\n    '\\x00-/:-@\\\\[-`{-\\U0010ffff'\n    ```\n\n2. `POSIX_ASCII`: acquire the POSIX property returning characters based on the POSIX definition (which are ASCII\n   characters) specified within the range of byte strings range. For example, `alnum` is defined as `[a-zA-Z0-9]`.\n   Because this returns strings in the range of byte strings, the inverse of `alnum` `(^alnum)` would only extend to\n   `\\xff`. It is recommended to encode these to byte string by using the encoding `latin-1`.\n\n    ```py\n    \u003e\u003e\u003e uniprops.get_posix_property(\"alnum\", mode=uniprops.POSIX_BYTES).encode('latin-1')\n    b'0-9A-Za-z'\n    \u003e\u003e\u003e uniprops.get_posix_property(\"^alnum\", mode=uniprops.POSIX_BYTES).encode('latin-1')\n    b'\\x00-/:-@\\\\[-`{-\\xff'\n    ```\n\n3. `POSIX_UNICODE`: acquire the POSIX property returning characters based on Unicode definition which are Unicode\n   characters specified in the Unicode range. For instance, `alnum` is equivalent to the regular expression of\n   `\\p{L\u0026}\\p{Nd}` (in regular expression engines that support Unicode properties).\n\n## Get Unicode Property\n\n```py\ndef get_unicode_property(prop, value=None, limit_ascii=False):\n    \"\"\"Retrieve the Unicode category from the table.\"\"\"\n```\n\nThis retrieves a Unicode property by its name and its value. If the result string is desired to be within a byte\nstring's range, set `limit_ascii` to `True`. Remember, to format for a byte string, simply encode as `latin-1`.\n\nTo use, simply specify the Unicode property and its value.\n\n```py\nuniprops.get_unicode_property('gc', 'l')\n```\n\nSetting the `prop` with a prefix of `^` will get the inverse result:\n\n```py\nuniprops.get_unicode_property('^gc', 'l')\n```\n\nBinary properties don't really need a `value` as the value is implied to be \"true\" unless it is prefixed with `^` and is\nthen implied to be false. But you can optionally specify `true`, `yes`, `y`, or `t` for an explicit \"true\" or `false`,\n`no`, `f`, or `n` for a \"false\".\n\nThe following are equivalent\n\n```py\nuniprops.get_unicode_property('alphabetic')\nuniprops.get_unicode_property('alphabetic', 'true')\n```\n\nThere are a few exceptions to the rules above. Like in Perl regular expressions, you can pass script extension and\nblock property value names, and they will be detected appropriately. Same goes for values under the general category\nproperty. Properties that only provide a single parameter like this are evaluated in the order: general category,\nbinary, scripts, and blocks properties.\n\nAs an example, these are equivalent:\n\n```py\nuniprops.get_unicode_property('gc', 'l')\nuniprops.get_unicode_property('l')\n```\n\nWhen passing a value as the property name, you can invert the result by placing the `^` directly on the value.\n\n```py\nuniprops.get_unicode_property('^gc', 'l')\nuniprops.get_unicode_property('^l')\n```\n\nThese are also equivalent\n\n```py\nuniprops.get_unicode_property('blk', 'basiclatin')\nuniprops.get_unicode_property('basiclatin')\n```\n\nThe last exception are properties specified with the `Is` and `In` prefix. We model the 3rd party\n[regex](https://bitbucket.org/mrabarnett/mrab-regex/src/hg/) library for Python with this support. `Is` will assume the\nproperty is a script extension or binary property while `In` will assume a `block` property. Generally this usage is\ndiscouraged as future Unicode versions may break such behavior with naming conflicts, but currently we support this.\n\nThese are all equivalent:\n\n```py\nuniprops.get_unicode_property('inbasiclatin')\nuniprops.get_unicode_property('blk', 'basiclatin')\nuniprops.get_unicode_property('basiclatin')\n```\n\n# License\n\nReleased under the MIT license.\n\n[github-ci-image]: https://github.com/facelessuser/uniprops/workflows/build/badge.svg?branch=main\u0026event=push\n[github-ci-link]: https://github.com/facelessuser/uniprops/actions?query=workflow%3Abuild+branch%3Amain\n[discord-image]: https://img.shields.io/discord/678289859768745989?logo=discord\u0026logoColor=aaaaaa\u0026color=mediumpurple\u0026labelColor=333333\n[discord-link]:https://discord.gg/TWs8Tgr\n[codecov-image]: https://img.shields.io/codecov/c/github/facelessuser/uniprops/main.svg?logo=codecov\u0026logoColor=aaaaaa\u0026labelColor=333333\n[codecov-link]: https://codecov.io/github/facelessuser/uniprops\n[pypi-image]: https://img.shields.io/pypi/v/uniprops.svg?logo=pypi\u0026logoColor=aaaaaa\u0026labelColor=333333\n[pypi-link]: https://pypi.python.org/pypi/uniprops\n[python-image]: https://img.shields.io/pypi/pyversions/uniprops?logo=python\u0026logoColor=aaaaaa\u0026labelColor=333333\n[license-image-mit]: https://img.shields.io/badge/license-MIT-blue.svg?labelColor=333333\n[donate-image]: https://img.shields.io/badge/Donate-PayPal-3fabd1?logo=paypal\n[donate-link]: https://www.paypal.me/facelessuser\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacelessuser%2Funiprops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacelessuser%2Funiprops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacelessuser%2Funiprops/lists"}