Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/node-unicode/node-unicode-data

JavaScript-compatible Unicode data generator. Arrays of code points, arrays of symbols, and regular expressions for every Unicode version’s categories, scripts, blocks, and properties — neatly packaged into a separate npm package per Unicode version.
https://github.com/node-unicode/node-unicode-data

Last synced: 3 months ago
JSON representation

JavaScript-compatible Unicode data generator. Arrays of code points, arrays of symbols, and regular expressions for every Unicode version’s categories, scripts, blocks, and properties — neatly packaged into a separate npm package per Unicode version.

Awesome Lists containing this project

README

        

# node-unicode-data

JavaScript-compatible Unicode data generator. Arrays of code points, arrays of symbols, and regular expressions for every Unicode version’s categories, scripts, script extensions, blocks, bidi data, and other properties — neatly packaged into a separate npm package per Unicode version.

## Using the data in your scripts

To use the generated data, simply install one of [the npm modules generated by this script](https://www.npmjs.com/org/unicode). Separate packages are available for each Unicode version. This allows you to do stuff like:

```js
// Get an array of all code points with the `White_Space` property:
const codePoints = require('@unicode/unicode-6.3.0/Binary_Property/White_Space/code-points');
// Get an array of strings (containing one symbol each) in the `Lu` category:
const symbols = require('@unicode/unicode-6.3.0/General_Category/Uppercase_Letter/symbols');
// Get a regular expression that matches any symbol in the `Aegean Numbers` block:
const regex = require('@unicode/unicode-6.3.0/Block/Aegean_Numbers/regex');
// Get an array of all code points in the `Egyptian_Hieroglyphs` script:
const hieroglyphs = require('@unicode/unicode-6.3.0/Script/Egyptian_Hieroglyphs/code-points');
// Get the canonical category a given code point belongs to:
// (Note: U+0041 is LATIN CAPITAL LETTER A)
const category = require('@unicode/unicode-6.3.0/General_Category').get(0x41);
// Get an array of all code points with a given bidi class:
const lre = require('@unicode/unicode-6.3.0/Bidi_Class/Left_To_Right_Embedding/code-points');
// Get the directionality of a given code point:
const directionality = require('@unicode/unicode-6.3.0/Bidi_Class').get(0x41);
// What glyph is the mirror image of `«` (U+00AB)?
const mirrored = require('@unicode/unicode-6.3.0/Bidi_Mirroring_Glyph').get(0xAB);
// Get a regular expression that matches all opening brackets:
const openingBrackets = require('@unicode/unicode-6.3.0/Bidi_Paired_Bracket_Type/Open/regex');
// …you get the idea.
```

For more information, see the README for the package you’re interested in. [Here’s the full list of npm packages generated by this script](https://www.npmjs.com/org/unicode):

* [_@unicode/1.1.5_](https://npmjs.org/package/@unicode/unicode-1.1.5#readme) ([repository](https://github.com/node-unicode/unicode-1.1.5#readme))
* [_@unicode/2.0.14_](https://npmjs.org/package/@unicode/unicode-2.0.14#readme) ([repository](https://github.com/node-unicode/unicode-2.0.14#readme))
* [_@unicode/2.1.2_](https://npmjs.org/package/@unicode/unicode-2.1.2#readme) ([repository](https://github.com/node-unicode/unicode-2.1.2#readme))
* [_@unicode/2.1.5_](https://npmjs.org/package/@unicode/unicode-2.1.5#readme) ([repository](https://github.com/node-unicode/unicode-2.1.5#readme))
* [_@unicode/2.1.8_](https://npmjs.org/package/@unicode/unicode-2.1.8#readme) ([repository](https://github.com/node-unicode/unicode-2.1.8#readme))
* [_@unicode/2.1.9_](https://npmjs.org/package/@unicode/unicode-2.1.9#readme) ([repository](https://github.com/node-unicode/unicode-2.1.9#readme))
* [_@unicode/3.0.0_](https://npmjs.org/package/@unicode/unicode-3.0.0#readme) ([repository](https://github.com/node-unicode/unicode-3.0.0#readme))
* [_@unicode/3.0.1_](https://npmjs.org/package/@unicode/unicode-3.0.1#readme) ([repository](https://github.com/node-unicode/unicode-3.0.1#readme))
* [_@unicode/3.1.0_](https://npmjs.org/package/@unicode/unicode-3.1.0#readme) ([repository](https://github.com/node-unicode/unicode-3.1.0#readme))
* [_@unicode/3.2.0_](https://npmjs.org/package/@unicode/unicode-3.2.0#readme) ([repository](https://github.com/node-unicode/unicode-3.2.0#readme))
* [_@unicode/4.0.0_](https://npmjs.org/package/@unicode/unicode-4.0.0#readme) ([repository](https://github.com/node-unicode/unicode-4.0.0#readme))
* [_@unicode/4.0.1_](https://npmjs.org/package/@unicode/unicode-4.0.1#readme) ([repository](https://github.com/node-unicode/unicode-4.0.1#readme))
* [_@unicode/4.1.0_](https://npmjs.org/package/@unicode/unicode-4.1.0#readme) ([repository](https://github.com/node-unicode/unicode-4.1.0#readme))
* [_@unicode/5.0.0_](https://npmjs.org/package/@unicode/unicode-5.0.0#readme) ([repository](https://github.com/node-unicode/unicode-5.0.0#readme))
* [_@unicode/5.1.0_](https://npmjs.org/package/@unicode/unicode-5.1.0#readme) ([repository](https://github.com/node-unicode/unicode-5.1.0#readme))
* [_@unicode/5.2.0_](https://npmjs.org/package/@unicode/unicode-5.2.0#readme) ([repository](https://github.com/node-unicode/unicode-5.2.0#readme))
* [_@unicode/6.0.0_](https://npmjs.org/package/@unicode/unicode-6.0.0#readme) ([repository](https://github.com/node-unicode/unicode-6.0.0#readme))
* [_@unicode/6.1.0_](https://npmjs.org/package/@unicode/unicode-6.1.0#readme) ([repository](https://github.com/node-unicode/unicode-6.1.0#readme))
* [_@unicode/6.2.0_](https://npmjs.org/package/@unicode/unicode-6.2.0#readme) ([repository](https://github.com/node-unicode/unicode-6.2.0#readme))
* [_@unicode/6.3.0_](https://npmjs.org/package/@unicode/unicode-6.3.0#readme) ([repository](https://github.com/node-unicode/unicode-6.3.0#readme))
* [_@unicode/7.0.0_](https://npmjs.org/package/@unicode/unicode-7.0.0#readme) ([repository](https://github.com/node-unicode/unicode-7.0.0#readme))
* [_@unicode/8.0.0_](https://npmjs.org/package/@unicode/unicode-8.0.0#readme) ([repository](https://github.com/node-unicode/unicode-8.0.0#readme))
* [_@unicode/9.0.0_](https://npmjs.org/package/@unicode/unicode-9.0.0#readme) ([repository](https://github.com/node-unicode/unicode-9.0.0#readme))
* [_@unicode/10.0.0_](https://npmjs.org/package/@unicode/unicode-10.0.0#readme) ([repository](https://github.com/node-unicode/unicode-10.0.0#readme))
* [_@unicode/11.0.0_](https://npmjs.org/package/@unicode/unicode-11.0.0#readme) ([repository](https://github.com/node-unicode/unicode-11.0.0#readme))
* [_@unicode/12.0.0_](https://npmjs.org/package/@unicode/unicode-12.0.0#readme) ([repository](https://github.com/node-unicode/unicode-12.0.0#readme))
* [_@unicode/12.1.0_](https://npmjs.org/package/@unicode/unicode-12.1.0#readme) ([repository](https://github.com/node-unicode/unicode-12.1.0#readme))
* [_@unicode/13.0.0_](https://npmjs.org/package/@unicode/unicode-13.0.0#readme) ([repository](https://github.com/node-unicode/unicode-13.0.0#readme))
* [_@unicode/14.0.0_](https://npmjs.org/package/@unicode/unicode-14.0.0#readme) ([repository](https://github.com/node-unicode/unicode-14.0.0#readme))
* [_@unicode/15.0.0_](https://npmjs.org/package/@unicode/unicode-15.0.0#readme) ([repository](https://github.com/node-unicode/unicode-15.0.0#readme))
* [_@unicode/15.1.0_](https://npmjs.org/package/@unicode/unicode-15.1.0#readme) ([repository](https://github.com/node-unicode/unicode-15.1.0#readme))

Note that these READMEs are auto-generated by this script, too – they describe all the data that is available for that particular Unicode version. To programmatically get this list of available categories, scripts, script extensions, blocks, and properties for a given Unicode version, just `require` the main module for that version:

```js
> require('unicode-6.3.0');
{
'Binary_Property': [
'Alphabetic', 'Any', 'ASCII', 'ASCII_Hex_Digit', 'Assigned', …
],
'General_Category': [
'Cased_Letter','Close_Punctuation','Connector_Punctuation', …
],
'Script': [
'Arabic', 'Armenian', 'Avestan', …
],
'Script_Extensions': [
'Arabic', 'Armenian', 'Avestan', …
],
'Block': [
'Aegean Numbers', 'Alchemical Symbols', …
],
'Case_Folding': [
'C', 'F', 'S', 'T'
],
'Bidi_Class': [
'Arabic_Letter', 'Arabic_Number', 'Boundary_Neutral', …
],
'Bidi_Mirroring_Glyph': [],
'Bidi_Paired_Bracket_Type': [
'Close', 'None', 'Open'
]
}
```

## For project maintainers

After cloning this repository, before doing anything else, run:

```sh
./clone-repos.sh
```

This clones all the generated repositories to your local `output` folder. You can then make changes to node-unicode-data, and use `./bootstrap.sh` to commit and push changes to each of these repositories.

## Generating the data

`npm run download` (re-)downloads the Unicode source files for all the Unicode versions defined in `data/resources.js`, saving them in the `data` folder.

`npm run build` generates data for all the Unicode versions defined in `data/resources.js`. This may take a few minutes… The regular expressions are generated using [Regenerate](https://mths.be/regenerate).

## Testing

`npm test` generates the data for the oldest and latest available Unicode version. This is a good way to test changes to the generator scripts before running `npm run-script generate`.

`npm run-script cover` generates [the code coverage report](http://rawgithub.com/node-unicode/node-unicode-data/master/coverage/index.html).

## Author

| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
|---|
| [Mathias Bynens](https://mathiasbynens.be/) |

## License

This module is available under the [MIT](https://mths.be/mit) license.