https://github.com/googlefonts/exemplar
JSON endpoints for CLDR exemplar data by locale tag
https://github.com/googlefonts/exemplar
casing cldr fonts i18n icu4c icu4j internationalization l10n language locale localization numbers punctuation unicode
Last synced: about 1 month ago
JSON representation
JSON endpoints for CLDR exemplar data by locale tag
- Host: GitHub
- URL: https://github.com/googlefonts/exemplar
- Owner: googlefonts
- License: apache-2.0
- Created: 2025-01-28T03:25:50.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-03T16:09:18.000Z (3 months ago)
- Last Synced: 2025-02-16T18:41:49.229Z (2 months ago)
- Topics: casing, cldr, fonts, i18n, icu4c, icu4j, internationalization, l10n, language, locale, localization, numbers, punctuation, unicode
- Language: Python
- Homepage: https://cdn.jsdelivr.net/gh/googlefonts/exemplar@1/api/data.json
- Size: 2.75 MB
- Stars: 0
- Watchers: 11
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Exemplar


## About
The Exemplar API project JSON endpoints include Unicode CLDR (Common Locale Data Repository) exemplar encoding data by [Unicode locale](https://unicode-org.github.io/icu/userguide/locale/#the-locale-concept). The goal of this project is to provide simple GET client access to localized exemplar encoding data in a public, widely-supported JSON format endpoint.
This project leverages the [ICU (International Components for Unicode) library](https://unicode-org.github.io/icu/) to distribute software internationalization data that are consistent with the latest Unicode data standards.
## Exemplar Data Categories
The following data are categorized by [Unicode locale](https://unicode-org.github.io/icu/userguide/locale/#the-locale-concept) tag:
- **Main Exemplars** [main]
- Minimum base set of characters used in the language. Note: only includes lowercase for languages that support different case forms. [CLDR documentation](https://cldr.unicode.org/translation/core-data/exemplars#exemplar-characters)
- Example [Engish (United States)]: abcdefghijklmnopqrstuvwxyz
- Example [Vietnamese (Vietnam)]: aáàăắằẵẳâấầẫẩãảạặậbcdđeéèêếềễểẽẻẹệghiíìĩỉịklmnoóòôốồỗổõỏơớờỡởợọộpqrstuúùũủưứừữửựụvxyýỳỹỷỵ
- Example [Hindi (India)]: ़ँंःॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसहऽािीुूृॅेैॉोौ्
- **Auxiliary Exemplars** [auxiliary]
- Foreign borrowings and specialized usage exemplars in the language. [CLDR documentation](https://cldr.unicode.org/translation/core-data/exemplars#exemplar-characters)
- Example [Engish (United States)]: áàăâåäãāæçéèĕêëēíìĭîïīñóòŏôöøōœúùŭûüūÿ
- Example [Vietnamese (Vietnam)]: fjwz
- Example [Hindi (India)]: ॄ
- **Case-insensitive Exemplars** [case_insensitive]
- Defines exemplar equivalence, irrespective of case, in the language.
- Example [Engish (United States)]: aAbBcCdDeEfFgGhHiIjJkKKlLmMnNoOpPqQrRsSſtTuUvVwWxXyYzZ
- Example [Vietnamese (Vietnam)]: aAáÁàÀăĂắẮằẰẵẴẳẲâÂấẤầẦẫẪẩẨãÃảẢạẠặẶậẬbBcCdDđĐeEéÉèÈêÊếẾềỀễỄểỂẽẼẻẺẹẸệỆgGhHiIíÍìÌĩĨỉỈịỊkKKlLmMnNoOóÓòÒôÔốỐồỒỗỖổỔõÕỏỎơƠớỚờỜỡỠởỞợỢọỌộỘpPqQrRsSſtTuUúÚùÙũŨủỦưƯứỨừỪữỮửỬựỰụỤvVxXyYýÝỳỲỹỸỷỶỵỴ
- Example [Hindi (India)]: ़ँंःॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसहऽािीुूृॅेैॉोौ्
- **Case-mapped Exemplars** [case_mapping]
- Defines the relationship between uppercase, lowercase and title case exemplars in the language for ICU case mapping. [ICU documentation](https://unicode-org.github.io/icu/userguide/transforms/casemappings.html)
- Example [Engish (United States)]: aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
- Example [Vietnamese (Vietnam)]: aAáÁàÀăĂắẮằẰẵẴẳẲâÂấẤầẦẫẪẩẨãÃảẢạẠặẶậẬbBcCdDđĐeEéÉèÈêÊếẾềỀễỄểỂẽẼẻẺẹẸệỆgGhHiIíÍìÌĩĨỉỈịỊkKlLmMnNoOóÓòÒôÔốỐồỒỗỖổỔõÕỏỎơƠớỚờỜỡỠởỞợỢọỌộỘpPqQrRsStTuUúÚùÙũŨủỦưƯứỨừỪữỮửỬựỰụỤvVxXyYýÝỳỲỹỸỷỶỵỴ
- Example [Hindi (India)]: ़ँंःॐअआइईउऊऋऌऍएऐऑओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसहऽािीुूृॅेैॉोौ्
- **Punctuation Exemplars** [punctuation]
- Punctuation customarily used with the language. [CLDR documentation](https://cldr.unicode.org/translation/core-data/exemplars#exemplar-characters)
- Example [Engish (United States)]: -‐‑–—,;:!?.…'‘’"“”()[]§@*/†‡′″
- Example [Vietnamese (Vietnam)]: -‐‑–—,;:!?.…'‘’"“”()[]§@*/†‡′″
- Example [Hindi (India)]: -‐‑–—,;:!?.…।॥'‘’"“”()[]§@*/#†‡′″॰
- **Numbers Exemplars** [numbers]
- The number digits used in the locale.
- Example [Engish (United States)]: 0123456789
- Example [Vietnamese (Vietnam)]: 0123456789
- Example [Hindi (India)]: 0123456789
- Example [Arabic (Egypt)]: ٠١٢٣٤٥٦٧٨٩
- **Currency Exemplars** [currency]
- The currency symbol used for formatting currency numbers in the locale.
- Example [Engish (United States)]: $
- Example [Vietnamese (Vietnam)]: ₫
- Example [Hindi (India)]: ₹Additional data including locale tag display names and ICU library definition version are available.
### Endpoints

#### Latest Release
```
https://cdn.jsdelivr.net/gh/googlefonts/exemplar@1/api/data.json
```#### Endpoint Versioning
The Exemplar API follows [semantic versioning](https://semver.org/) principles. Backwards compatibility is assured within major version releases. For example, all `v1` endpoints will remain backwards compatible with any changes or additions made across the `v1.x.x` releases.
The root endpoint is:
```
https://cdn.jsdelivr.net
```Versioned endpoint construction uses the following syntax:
```
/gh/googlefonts/exemplar@[VERSION]/api/data.json
```where `[VERSION]` represents a repository semantic version release number git tag, or git commit hash.
#### Major Version Tracking Release
Automatically update to new point releases across a major release cycle by using the major release number only:
```
/gh/googlefonts/exemplar@1/api/data.json
```#### Pinned Version Release
Define a pinned point release with a full release version number in `MAJOR.MINOR.PATCH` syntax:
```
/gh/googlefonts/[email protected]/api/data.json
```### Data Structure
The JSON data follow a [repository-defined schema](schema.json). Below is an overview of the structure:
#### Exemplar JSON
```json
{
"icu_version": "version_string",
"locales": {
"locale_id": {
"main": {
"single_chars": ["char1", "char2"],
"sequences": ["seq1", "seq2"]
},
"auxiliary": {
"single_chars": ["char1", "char2"],
"sequences": ["seq1", "seq2"]
},
"punctuation": ["punct1", "punct2"],
"case_insensitive": {
"single_chars": ["char1", "char2"],
"sequences": ["seq1", "seq2"]
},
"case_mapping": {
"single_chars": ["char1", "char2"],
"sequences": ["seq1", "seq2"]
},
"numbers": {
"decimal": "decimal_char",
"group": "group_char",
"percent": "percent_char",
"zero_digit": "zero_digit_char",
"digit": "digit_char",
"pattern_digit": "pattern_digit_char",
"plus_sign": "plus_sign_char",
"minus_sign": "minus_sign_char",
"exponential": "exponential_char",
"per_mille": "per_mille_char",
"infinity": "infinity_char",
"nan": "nan_char",
"digits": ["digit1", "digit2"]
},
"currency": "currency_symbol"
}
},
"display_names": {
"locale_id": "Locale Display Name"
}
}
```**Note:** The following fields can have null values when data do not exist or are not available:
- `punctuation`
- `case_insensitive.single_chars`
- `case_insensitive.sequences`
- `case_mapping.single_chars`
- `case_mapping.sequences`
- `currency`## Example Usage
There are demo scripts in the [`examples` directory](examples/) that demonstrate how to use the Exemplar project JSON data. These examples include:
- [**currency.py**](examples/currency.py): Demonstrates how to extract and print localized currency symbols and their Unicode codepoints from the JSON data.
- [**locsets.py**](examples/locsets.py): Demonstrates how to extract and print locale-specific exemplar character sets from the JSON data. This script takes a locale ID as a command-line argument and reports the main, auxiliary, case-insensitive, case-mapping, numbers, punctuation, and currency exemplars for the specified locale.## Development
The JSON data are generated with the [`exemplars.py`](exemplars.py) script in the root of the repository. The [schema.json](schema.json) file defines the JSON structure for validation testing at runtime. The Python dependencies are defined in the [requirements.txt](requirements.txt) file.
The JSON data can be generated with the following command:
```
$ python exemplars.py
```JSON files write to the `api` sub-directory.
## Changelog
Please see the [CHANGELOG.md](CHANGELOG.md) file in the root of the repository.
## Licenses
The source code in this repository is licensed under [the Apache License, Version 2.0](LICENSE.md).
The CLDR data are redistributed under [the Unicode License v3](https://www.unicode.org/license.txt).