https://github.com/mlocati/unipoints
A Unicode Codepoint library for PHP
https://github.com/mlocati/unipoints
Last synced: 24 days ago
JSON representation
A Unicode Codepoint library for PHP
- Host: GitHub
- URL: https://github.com/mlocati/unipoints
- Owner: mlocati
- License: mit
- Created: 2023-10-27T15:52:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-20T09:00:51.000Z (8 months ago)
- Last Synced: 2025-03-20T10:21:29.051Z (8 months ago)
- Language: PHP
- Homepage: https://mlocati.github.io/unipoints/
- Size: 4.56 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.txt
Awesome Lists containing this project
README
[](https://github.com/mlocati/unipoints/actions/workflows/tests.yml)
# A Unicode Codepoint library for PHP
## Simplified Unicode Terminology
### Codepoints
Codepoints are characters, spaces, symbols, punctuations, separators, ... that is, the single units that compose texts.
### Blocks
Codepoints are grouped in blocks, that is, groups of contiguous codepoints that are part of a common set.
Examples:
- a is contained in the `Basic Latin` block
- α is contained in the `Greek and Coptic` block
- 𝅘𝅥𝅮 is contained in the `Musical Symbols` block
- ↩ is contained in the `Arrows` block
- ☂ is contained in the `Miscellaneous Symbols` block
### Planes
Planes are blocks of 65,536 contiguous codepoints and may contain zero, one or many blocks.
### General Category
This library also provides the general category of every codepoint, that is, you can know if a codepoint is a lowercase letter, a symbol, a punctuation, and so on.
### Surrogate Codepoints
In order to extend the number of codepoints that can be represented with 16 bits, Unicode introduced "Surrogates".
A single character (or punctuation, ...) can be represented by combining two consecutive surrogates (called "high surrogate" and "low surrogate").
That means that such codepoints have a meaning only in pair.
## Sample Usage
Codepoints are listed in the string-backed [`MLUnipoints\Codepoint`](https://github.com/mlocati/unipoints/blob/main/src/Codepoint.php) enum.
The value of the enum cases strings contain the unicode symbol: that way, for example in order to get the case of `a`, you simply can simply write:
```php
use MLUnipoints\Codepoint;
$codepoint = Codepoint::from('a');
```
Since the `MLUnipoints\Codepoint` enum is rather big (it can use tens of MB of memory when you autoload it), you can also use the block-specific instances defined under the `MLUnipoints\Codepoint` namespace (but that requires that you already know the block in advance).
For example:
```php
use MLUnipoints\Codepoint;
$codepoint = Codepoint\Basic_Latin::from('a');
```
Every case of the `MLUnipoints\Codepoint` enum has a [`MLUnipoints\Info\CodepointInfo`](https://github.com/mlocati/unipoints/blob/main/src/Info/CategoryInfo.php) attribute.
You can easily retrieve this attribute by writing
```php
use MLUnipoints\Codepoint;
use MLUnipoints\Info\CodepointInfo;
$codepoint = Codepoint::from('a');
$codepointInfo = CodepointInfo::from(Codepoint::from('a'));
```
This attribute provides the numeric value of the codepoint, the Unicode name, the general category, and (if you don't use the block-specific enums) the block.
You can also similarly the details of the block, plane and the general category.
For example, this code:
```php
use MLUnipoints\Codepoint;
use MLUnipoints\Info\BlockInfo;
use MLUnipoints\Info\CategoryInfo;
use MLUnipoints\Info\CodepointInfo;
use MLUnipoints\Info\PlaneInfo;
$codepoint = Codepoint::from('a');
$codepointInfo = CodepointInfo::from($codepoint);
$categoryInfo = CategoryInfo::from($codepointInfo->category);
$blockInfo = BlockInfo::from($codepointInfo->block);
$planeInfo = PlaneInfo::from($blockInfo->plane);
echo 'Codepoint: ', $codepointInfo->id, "\n";
echo 'Codepoint name: ', $codepointInfo->name, "\n";
echo 'Codepoint general category: ', $categoryInfo->description, "\n";
foreach ($categoryInfo->parentCategories as $parentCategory) {
echo 'Codepoint parent general category: ', CategoryInfo::from($parentCategory)->description, "\n";
}
echo 'Block name: ', $blockInfo->name, "\n";
echo 'Plane name: ', $planeInfo->name, "\n";
echo 'Plane short name: ', $planeInfo->shortName, "\n";
```
will output:
```
Codepoint: 97
Codepoint name: LATIN SMALL LETTER A
Codepoint general category: a lowercase letter
Codepoint parent general category: a cased letter
Codepoint parent general category: a letter
Block name: Basic Latin
Plane name: Basic Multilingual Plane
Plane short name: BMP
```
You can also use the Unicode enums to print out characters and symbols.
For example:
```php
use MLUnipoints\Codepoint;
echo Codepoint::SUN_BEHIND_CLOUD->value;
```
will print
> ⛅
## Do you really want to say thank you?
You can offer me a [monthly coffee](https://github.com/sponsors/mlocati) or a [one-time coffee](https://paypal.me/mlocati) :wink: