https://github.com/eklem/unicode-emojis-unique-id-json

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/eklem/unicode-emojis-unique-id-json
Owner: eklem
License: mit
Created: 2023-07-14T05:48:23.000Z (about 3 years ago)
Default Branch: trunk
Last Pushed: 2025-03-11T07:30:48.000Z (over 1 year ago)
Last Synced: 2025-04-01T16:06:20.352Z (over 1 year ago)
Language: JavaScript
Size: 464 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# unicode-emojis-unique-id-json

JSON-version of https://unicode.org/Public/emoji/15.1/emoji-test.txt with unique IDs (numbers). The IDs should be unique over time, when new versions of unicode emojis are released. There are a little less than 4000 emojis now if you count the ones with modifiers. I'm guessing it will pass 10000 in the foreseeable future, so the IDs should be 5-digits. But using 6-digit IDs since a depending library needs this.

The IDs will start at 000001 and then counting. The reasons for this way of creating IDs instead of some hashing algorithm are two:

* The IDs need to be numbers
* The IDs need to be as short as possible

The usage is for the [otp-ecnrypt-js](https://github.com/eklem/otp-encrypt-js). More specific: To be able to encrypt and decrypt emojis (in addition to characters and numbers). Emoji encryption and decryption are done through a codebook, starting with a 0 to be identified as a emoji plaincode and then followed by 5 digits.

## Content of JSON

Text-line from https://unicode.org/Public/emoji/15.1/emoji-test.txt

```text
1F600 ; fully-qualified # 😀 E1.0 grinning face
```

will be:

```Json
{
"unicodeEmojisVersion": "13.0",
"emojis": [
{
"id": "000001",
"emoji": "😃",
"description": "grinning face with big eyes",
"unicode": ["U+1F603"],
"versionIntroduced": "0.6"
}
]
}
```

## To get back to the same IDs if something goes wrong

Start from Unicode Emojis v14.0, then run the script on all versions after. That will so far be:

* 13.1
* 14.0 <- We have gotten this far
* 15.0
* 15.1
* 16.0
* [future versions]

Version 15 won't happen before I get regex to work for the [two-character emojis introduced in v15.0 and v15.1](https://github.com/eklem/unicode-emojis-unique-id-json/issues/9).

## Work to be done

* [x] regex for extracting content from text-file.
* [x] read old JSON, fetch new emojis, convert to JSON and add what's not in the old JSON with unique IDs
* [x] write to JSON
* [x] show which unicode emoji versions are met with this library
* [x] tests to check that some IDs for previous versions of this library are corresponding to new version of library. Do this by having a previous version directory

## Tests

Should have tests to ensure that IDs for previous versions of unicode emoji IDs are persistent.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eklem/unicode-emojis-unique-id-json

Awesome Lists containing this project

README