{"id":20632534,"url":"https://github.com/dylan-profiler/tangled-up-in-unicode","last_synced_at":"2025-04-15T19:03:10.324Z","repository":{"id":46752317,"uuid":"210143820","full_name":"dylan-profiler/tangled-up-in-unicode","owner":"dylan-profiler","description":"Access to the Unicode Character Database (UCD)","archived":false,"fork":false,"pushed_at":"2022-11-08T17:51:33.000Z","size":7549,"stargazers_count":3,"open_issues_count":3,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-15T19:01:53.270Z","etag":null,"topics":["data-analysis","data-quality","exploration","linguistic-analysis","linguistics","python","unicode"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dylan-profiler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-22T12:30:34.000Z","updated_at":"2023-05-04T02:08:32.000Z","dependencies_parsed_at":"2022-09-26T18:00:55.912Z","dependency_job_id":null,"html_url":"https://github.com/dylan-profiler/tangled-up-in-unicode","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Ftangled-up-in-unicode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Ftangled-up-in-unicode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Ftangled-up-in-unicode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylan-profiler%2Ftangled-up-in-unicode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dylan-profiler","download_url":"https://codeload.github.com/dylan-profiler/tangled-up-in-unicode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249135822,"owners_count":21218365,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-quality","exploration","linguistic-analysis","linguistics","python","unicode"],"created_at":"2024-11-16T14:16:29.181Z","updated_at":"2025-04-15T19:03:10.259Z","avatar_url":"https://github.com/dylan-profiler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tangled up in Unicode\n\nThis module provides access to character properties for all Unicode characters, from the Unicode Character Database (UCD) .\nThis module provides an alternative to Python's standard library [`unicodedata`](https://docs.python.org/3/library/unicodedata.html).\n`Tangled up in Unicode` provides four main benefits compared to the standard library:\n- The [latest version](http://www.unicode.org/versions/latest/) of the Unicode database is used.\n- Adds human-readable class names (Property value aliases).\n- Extends the properties to use more potential of the database.\n- UCD version independent of Python version (Python 3.6 has UCD 9.0, 3.7 has UCD 11.0.0, 3.8 has 12.0.1, 3.9 has 13.0.0)\n\nNote that Python 3 added unicode support, but that this is different from the UCD.\nUnicode support handles storing and manipulating unicode characters, while this package aims to provide properties of specific characters.\n\n\u003c!-- Please read the [docs](#) for details.--\u003e\n\n## Example\n\nThe default lookup in `unicodedata` for `$`:\n\n| Property \t\t\t\t\t| Value \t\t \t|\n|---------------------------|-------------------|\n| Name\t   \t\t\t\t\t| Dollar Sign \t\t|\n| Category (Short)\t\t\t| Sc \t\t \t\t|\n| Bidirectional (Short) \t| ET \t\t\t\t|\n| Combining\t\t\t\t\t| 0\t\t\t\t\t|\n| Mirrored\t\t\t\t\t| 0\t\t\t\t\t|\n| East Asian Width (Short)\t| Na\t\t\t\t|\n| Decomposition\t\t\t\t| \t\t\t\t\t|\n\nExtra information provided by this package\n\n| Property \t\t\t\t\t\t| Value \t\t \t\t|\n|-------------------------------|-----------------------|\n| Category Alias (Long)\t\t\t| Currency_Symbol\t\t|\n| Bidirectional Alias (Long)\t| European_Terminator\t|\n| East Asian Width Alias (Long)\t| Narrow\t\t\t\t|\n| Script (Long)\t\t\t\t\t| Common\t\t\t\t|\n| Script (Short)\t\t\t\t| Zyyy\t\t\t\t\t|\n| Block (Long)\t\t\t\t\t| Basic_Latin\t\t\t|\n| Block (Short)\t\t\t\t\t| ASCII\t\t\t\t\t|\n| PropList\t\t\t\t\t\t| Pattern_Syntax\t\t|\n| Uppercase Character\t\t\t|\t\t\t\t\t\t|\n| Lowercase Character\t\t\t|\t\t\t\t\t\t|\n| Titlecase\tCharacter\t\t\t|\t\t\t\t\t\t|\n\n\n## Properties comparison\n\n| Property\t\t\t\t\t| `tangled-up-in-unicode`\t\t\t| `unicodedata` \t\t|\n|---------------------------|-------------------------------|-----------------------|\n| Name\t\t\t\t\t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Decimal\t\t\t\t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Digit\t\t\t\t\t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Numeric\t\t\t\t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Combining           \t\t| \u0026#9745; + alias\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Mirrored           \t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Decomposition        \t\t| \u0026#9745;\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Category\t\t\t\t\t| \u0026#9745; + alias\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Bidirectional\t\t\t\t| \u0026#9745; + alias\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| East Asian Width\t\t\t| \u0026#9745; + alias\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| Script\t\t\t\t\t| \u0026#9745; + alias\t\t\t\t| -  \t\t\t\t\t|\n| Block\t\t\t\t\t\t| \u0026#9745; + alias\t\t\t\t| -  \t\t\t\t\t|\n| Age\t\t\t\t\t\t| \u0026#9745; + alias\t\t\t\t| -  \t\t\t\t\t|\n| Binary Property Values \t| \u0026#9745;\t\t\t\t\t\t| -  \t\t\t\t\t|\n| Version\t\t\t\t\t| 14.0.0 ([latest](http://www.unicode.org/versions/latest/))\t\t\t\t| 12.0.1\t\t\t\t|\n\n_Table 1: presence of properties is denoted by \u0026#9745; (Unicode Character 'BALLOT BOX WITH CHECK' (U+2611))._\t\t\n\n## Usage\n\n```python\nimport tangled_up_in_unicode as unicodedata\n```\n\nThe package can be installed via pip:\n\n```\npip install tangled-up-in-unicode\n```\n\n## Performance\n\nThe module is written in Python. \nIt can be compiled with Cython to gain [competitive performance](# \"Meaning the null hypothesis of the two libraries having the same average runtime could not be rejected.\") with the native library.\n\n## Unsupported features\n\nSome of the features in `unicodedata` are not supported. \n\n| Feature\t\t\t\t| `tangled-up-in-unicode`\t\t| `unicodedata` \t\t|\n|-----------------------|-------------------------------|-----------------------|\n| lookup\t           \t| -\t\t\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| normalize           \t| -\t\t\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n| ucd_3_2_0      \t\t| -\t\t\t\t\t\t\t\t| \u0026#9745;  \t\t\t\t|\n\n## Acknowledgements\nWhere possible, code and documentation of the original module are used.\nThis repository is part of the Dylan Profiling project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylan-profiler%2Ftangled-up-in-unicode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdylan-profiler%2Ftangled-up-in-unicode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylan-profiler%2Ftangled-up-in-unicode/lists"}