Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chbrown/unidata

Javascript interface to the Unicode Character Database
https://github.com/chbrown/unidata

Last synced: 3 months ago
JSON representation

Javascript interface to the Unicode Character Database

Awesome Lists containing this project

README

        

# unidata

[![npm package version](https://badge.fury.io/js/unidata.svg)](https://www.npmjs.com/package/unidata)

JavaScript interface to the [Unicode Character Database](http://www.unicode.org/reports/tr44/).
The package version is intended to match the Unicode version.

npm install unidata --save

## API

Using ES6 syntax:

import {getBlocks, getCharacters} from 'unidata'

There are a few other exports (and a couple useful interfaces, if you're using TypeScript),
but these are the main methods.

They both simply call `require()` to load the preprocessed Unicode data from a JSON file.

const blocks = getBlocks()
const characters = getCharacters()

These are both just arrays.
* `blocks` is an array of [`Block`](index.d.ts)
* `characters` is an array of [`Character`](index.d.ts)

Blocks are simple; _every_ item in `blocks` has these three fields:

```js
> blocks.length
308
> blocks.slice(0, 5)
[ { startCode: 0, endCode: 127, blockName: 'Basic Latin' },
{ startCode: 128, endCode: 255, blockName: 'Latin-1 Supplement' },
{ startCode: 256, endCode: 383, blockName: 'Latin Extended-A' },
{ startCode: 384, endCode: 591, blockName: 'Latin Extended-B' },
{ startCode: 592, endCode: 687, blockName: 'IPA Extensions' } ]
```

Characters are richer, but the representation is parsimonious:
if a value is not available or not applicable for a given character, that key will be omitted.

```js
> characters.length
33797
> characters.slice(32, 40)
[ { code: 32, name: 'SPACE', cat: 'Zs', bidi: 'WS' },
{ code: 33, name: 'EXCLAMATION MARK', cat: 'Po', bidi: 'ON' },
{ code: 34, name: 'QUOTATION MARK', cat: 'Po', bidi: 'ON' },
{ code: 35, name: 'NUMBER SIGN', cat: 'Po', bidi: 'ET' },
{ code: 36, name: 'DOLLAR SIGN', cat: 'Sc', bidi: 'ET' },
{ code: 37, name: 'PERCENT SIGN', cat: 'Po', bidi: 'ET' },
{ code: 38, name: 'AMPERSAND', cat: 'Po', bidi: 'ON' },
{ code: 39, name: 'APOSTROPHE', cat: 'Po', bidi: 'ON', oldName: 'APOSTROPHE-QUOTE' } ]
```

The first three fields, `code`, `name`, and `cat`, are always present.
The other ten are optional.
(For details on the optional fields, and what values to assume when they are omitted,
see the comments on the `Block` interface.)

## License

Copyright 2015–2020 Christopher Brown.
[MIT Licensed](https://chbrown.github.io/licenses/MIT/#2015-2020).