https://github.com/loctools/locjson
A specification for JSON-based translation interchange file format
https://github.com/loctools/locjson
file-format i18n internationalization l10n localization specification
Last synced: about 2 months ago
JSON representation
A specification for JSON-based translation interchange file format
- Host: GitHub
- URL: https://github.com/loctools/locjson
- Owner: loctools
- Created: 2019-10-10T22:00:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-08-06T01:44:11.000Z (almost 5 years ago)
- Last Synced: 2025-01-21T09:29:07.471Z (4 months ago)
- Topics: file-format, i18n, internationalization, l10n, localization, specification
- Size: 9.77 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LocJSON
This is a specification for JSON-based translation interchange file format. It can be used in a bilingual mode in place of PO or XLIFF files to send strings for translation, and also as a monolingual resource file format.
The following requirements were taken into consideration. The file format should be:
1. Easy to parse/serialize (JSON has the widest support among programming languages);
2. Easy to traverse / amend (straightforward structure, no conditional blocks or alternative sub-structures);
3. Easy to read by a human; in canonical key sort order, the overall sequence of keys should make sense;
4. Easy to diff (when used in version control systems);
5. Easy to store large strings (arbitrary line splitting, as in GetText PO files);
6. Preserving unit order (i.e. not a key-value dictionary);
7. Capable of storing multi-line comments;
8. Suitable for bilingual and monolingual use (share the same basic structure);
9. Easy to extend to store extra properties, if needed for external tools.
# Status
- **DRAFT**. The document is expected to be updated/clarified.
# Terminology
- **originating application** — an application that creates LocJSON files for translation and that consumes localized copies of such a file.
- **translation tool** — a TMS (Translation Management System), CAT (Computer-Aided Translation) tool, or some other tool that takes LocJSON files and updates them with translations.# Specification
1. LocJSON files are always pretty-printed.
2. Indentation is always 4 spaces.
3. All keys in dictionaries are always sorted alphabetically.
4. All line breaks inside strings are represented with a single `\n` symbol (Unix-style).
5. LocJSON files use the `.locjson` file extension.## Top-level structure
The top-level structure represents a dictionary with exactly two keys:
```json
{
"properties" : {
...
},
"units" : [
...
]
}
```1. `properties` [optional, dictionary] — information related to the file as a whole.
2. `units` [required, array] — an ordered list of translatable units (segments).### Top-level `properties` block contents
This dictionary stores information related to the file as a whole:
```json
"properties": {
"comments": ["This file was generated by AwesomeTool"],
"version": 1
}
```1. `comments` [optional, array of strings] — a list of file-wide comments. To render a multiline comment string, the array is concatenated with a single newline (`\n`) character.
2. `version` [optional, number] — the LocJSON file format version. If omitted, `1` is implied.#### Notes on versioning
File format version is always an integer number, and will only increase if any backward-incompatible changes are introduced. But the intent is to always stay at version `1` and not introduce any backward-incompatible changes.
### Top-level `units` block
Each unit in this array is a dictionary:
```json
...
{
"key": "someUniqueKey",
"properties": {
...
},
"source": [
"Line 1\n",
"Line 2\n",
"\n",
"A very very very long line split into several ",
"based on a 50-character limit."
]
},
...
```1. `key` [required, string] — a unique (within that file) identifier of the string.
2. `properties` [optional, dictionary] — properties of a particular unit.
3. `source` [required1, array] — an array of strings that, if concatenated together, comprises a source text to translate. Lines must be split after `\n`. It is also recommended to split long strings into multiple ones, each of 50 symbols or less, including two symbols for the line break symbol (`\n`).
4. `target` [optional1, array] — an array of strings that, if concatenated together, comprises a target (translated) text. The same line splitting rules apply as with `source`.1 In bilingual use, it is expected to have both `source` and `target` fields present, and translation goes into `target` field; in monolingual use, a translation tool must put the translation back into `source` field when generating a localized copy of a file and omit the `target` field.
### Per-unit `properties` block contents
This dictionary stores information related to the unit:
```json
"properties": {
"comments": [
"Unit comment line 1",
"Unit comment line 2"
]
}
```1. `comments` [optional, array of strings] — a list of unit-specific comments. To render a multiline comment string, the array is concatenated with a single newline (`\n`) character.
## Application-specific extensions
An application that generates a LocJSON file for translation may want to pass some extra proprietary information in the file. This is allowed only in `properties` dictionaries (both at top level and for each unit). All application-specific keys must start with an `x-` prefix, followed by a tool-specific sub-prefix, forming a namespace for that particular tool.
Consider the following example, where an imaginary tool, _AwesomeTool_, adds its own internal identifiers that start with `x-awesometool-` prefix:
```json
{
"properties": {
"comments": ["This file was generated by AwesomeTool 2.34"],
"x-awesometool-generator-version": "2.34",
"x-awesometool-file-id": "xf-12345",
"x-awesometool-original-filename": "resources.js"
},
"units": [
{
"key": "testKey",
"properties": {
"comments": ["This is a comment."],
"x-awesometool-unit-id": "xu-65432"
},
"source": ["Some text."]
}
]
}
```# Translation tool behavior
A translation tool must keep the entire structure of a LocJSON file intact. It is only allowed to add, remove, or modify the contents of a `target` array in each unit definition (or only modify the `source` array in case of monolingual use).
A translation tool may read and use other properties, including the ones that start with `x-` (for example, show them in the translation UI as an additional context).
A translation tool must not add any custom properties, reorder units, or reformat the contents of `source` or `target` array in any unit unless it modifies a particular translation for that unit.
A translation tool may only modify it's own known properties (i.e. the ones that _pre-existed_ in the LocJSON file). This gives an originating application that generates a LocJSON file to control the set of properties it supports, and ensures it can parse the returned file. For example, if LocJSON is generated for an imaginary translation tool `Foo`, and it is known that this tool supports a property `x-foo-fuzzy` (which also has an equivalent in an originating application), then an originating application can include `x-foo-fuzzy` in LocJSON file, and this property will become a part of a contract between an originating application and a translation tool.
A translation tool may be explicitly instructed to remove all `properties` keys (both file-level and unit-level) upon generating a localized version of a file. This reduces the file size of all resources and keeps only the minimal data needed (an array of units with keys and translations; see the _Minimal example_ section below). A translation tool should never remove `properties` keys by default.
# Minimal example
Given the optional nature of `properties` blocks, a minimal generated LocJSON file would look like this:
```json
{
"units": [
{
"key": "key1",
"source": ["String 1"],
},
{
"key": "key2",
"source": ["String 2"],
},
...
]
}
```## Bilingual use
In a bilingual use, a localized copy returned by a translation tool would look like this:
```json
{
"units": [
{
"key": "key1",
"source": ["String 1"],
"target": ["Translated string 1"],
},
{
"key": "key2",
"source": ["String 2"],
"target": ["Translated string 2"],
},
...
]
}
```## Monolingual use
In a monolingual use, a localized copy returned by a translation tool (with translations written directly into `source`) would look like this:
```json
{
"units": [
{
"key": "key1",
"source": ["Translated string 1"],
},
{
"key": "key2",
"source": ["Translated string 2"],
},
...
]
}
```There are two reasons translations are written directly into `source` in this mode:
1. The structure between source and localized files stays the same to simplify resource file handling.
2. A localized file can immediately serve as a source file to localize it into other languages (for example, a Chinese source file is translated into English first, and English one is then translated into all other languages).# Full example
Here's a fuller example of LocJSON file that has comments:
```json
{
"properties": {
"comments": ["This file was generated by AwesomeTool"],
"version": 1
},
"units": [
{
"key": "welcomeMessage",
"properties": {
"comments": ["{USER} here is replaced with the first name of a signed in user"]
},
"source": ["Hello, {USER}!"]
},
{
"key": "signInButtonCaption",
"properties": {
"comments": ["https://example.com/preview/sign-in-dialog-screenshot.png"]
},
"source": ["Sign In"]
},
{
"key": "signInFooterText",
"properties": {
"comments": [
"This text is displayed below the sign in form.",
"https://example.com/preview/sign-in-dialog-screenshot.png"
]
},
"source": [
"Please read our Privacy ",
"Policy and Terms of ",
"Service"
]
}
]
}
```# Mapping LocJSON files to languages
LocJSON format intentionally has no built-in notion of _source language_ or _target language_ properties. It is expected that these properties are implied from the folder structure with LocJSON files, or from file names themselves, and are a part of a contract between an originating application and a translation tool.
Consider the following example folder structure:
```
en/
login.locjson
main.locjson
settings.locjson
ru/
login.locjson
main.locjson
settings.locjson
zh-hans/
login.locjson
main.locjson
settings.locjson
```Here the files have the same name, but belong to a unique top-level language folder. Now if the agreement is to take English files from `en` folder and put them into corresponding target language folders, the question of source and target language is answered automatically.
Another example:
```
login/
en.locjson
ru.locjson
zh-hans.locjson
main/
en.locjson
ru.locjson
zh-hans.locjson
settings/
en.locjson
ru.locjson
zh-hans.locjson
```Here localization files are split by a component, but the language is implied from the file name itself.
Both examples are equally acceptable for both bilingual and monolingual use. In all scenarios, being it a manual or automated localization process, forcing a certain file and folder naming convention to determine source and target languages (as opposed to specifying them as properties within a LocJSON file and relying solely on these properties), is expected to lead to a better discipline and a cleaner localization process overall.