https://github.com/matecat/subfiltering
Subfiltering is a component used by Matecat and MyMemory for converting strings between the database, external services, and the UI layers. It provides a pipeline of filters to safely transform content across these layers while preserving XLIFF tags, HTML placeholders, and special entities.
https://github.com/matecat/subfiltering
Last synced: 2 months ago
JSON representation
Subfiltering is a component used by Matecat and MyMemory for converting strings between the database, external services, and the UI layers. It provides a pipeline of filters to safely transform content across these layers while preserving XLIFF tags, HTML placeholders, and special entities.
- Host: GitHub
- URL: https://github.com/matecat/subfiltering
- Owner: matecat
- License: lgpl-3.0
- Created: 2021-05-24T08:53:00.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2026-03-18T16:33:24.000Z (3 months ago)
- Last Synced: 2026-03-19T06:24:40.403Z (3 months ago)
- Language: PHP
- Homepage:
- Size: 411 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Matecat Subfiltering
[](https://app.travis-ci.com/matecat/subfiltering)
[](https://sonarcloud.io/summary/new_code?id=matecat_subfiltering)
[](https://sonarcloud.io/summary/new_code?id=matecat_subfiltering)
[](https://sonarcloud.io/summary/new_code?id=matecat_subfiltering)
[](https://sonarcloud.io/summary/new_code?id=matecat_subfiltering)
Subfiltering is a component used by Matecat and MyMemory for converting strings between the database, external services, and the UI layers.
It provides a pipeline of filters to safely transform content across these layers while preserving XLIFF tags, HTML placeholders, and special entities.
## Overview
Embedding XML in a REST JSON payload is notoriously hard to render safely and legibly in a web browser.
Browsers, frameworks, and JSON serializers all have opinions about angle brackets, entities, and special characters.
The result is typically a mix of double-encoding, broken markup, or inline codes that translators can accidentally damage.
This library solves that by introducing reversible “layers” and a transformation pipeline that makes XML- and XLIFF-rich content safe for transport and UI display, while guaranteeing you can restore the exact original.
What makes XML in JSON hard for the browser
- Angle brackets and entities: Raw < and > conflict with HTML, and HTML/JS frameworks may escape or re-escape entities differently than you expect.
- Inline codes in text: XLIFF inline tags (ph, pc, etc.), HTML/XML snippets, ICU, or sprintf tokens can be misinterpreted or edited improperly when shown as-is.
- Safety vs. readability: You need to prevent XSS and layout breakage, but you also need a UI where users can read and edit the text around inline codes.
- Use it when:
- Your source text includes variables, placeholders, XML, or HTML tags.
- You must accept user edits while preventing structural damage to tags.
- What it gives you:
- Converts inline tags to robust placeholders with base64 “memory” of the original, then restores exactly after the round-trip.
- Prevents double-encoding and protects structural elements.
In short, this library is a bridge between “XML-correct” and “browser-safe,” letting you serve and accept JSON payloads that are straightforward to display and edit in the web UI,
while guaranteeing that your original XML/XLIFF structure is preserved perfectly end to end.
## How the library addresses it
- Normalizes and preserves XLIFF tags across transformations.
- Encodes/decodes special characters and placeholders for safe round-trips.
- Converts between three processing layers:
- Layer 0 (Database): A database-safe XML form, suitable for persistence, export, and exact reconstruction.
- Layer 1 (External services): A transport-safe form tailored for MT/TM systems that aren’t XML-aware.
- Layer 2 (UI): A browser/UI-friendly form that replaces raw tags with safe placeholders and base64-backed metadata.
- UI-friendly placeholders
- XML/XLIFF/HTML tags are converted to stable placeholders with an embedded, base64-encoded “memory” of the original tag.
- The UI can display and move placeholders without exposing raw markup, reducing the risk of accidental tag damage.
- Reversible roundtrips
- When the browser sends edited text back, the library restores Layer 2 content to Layer 0, reconstructing the exact original tags from the placeholders.
- The same applies for Layer 0 ↔ Layer 1 when calling external services.
- Supports XLIFF 2.x dataRef replacement, aligning inline codes from `` with inline tags in segments.
- If your XLIFF uses originalData with dataRef/dataRefStart/dataRefEnd, the library will create meaningful placeholders for the UI and then restore real XLIFF tags afterward.
- This keeps both the JSON payload and browser rendering safe without losing fidelity.
## Installation
Install via Composer:
```shell
bash composer require matecat/subfiltering
```
Requirements:
- PHP 7.4+
- PHPUnit 9.x for running tests (dev)
## Filters
Two concrete filters are provided (both implement `AbstractFilter`):
- `Matecat\SubFiltering\MateCatFilter`
- `Matecat\SubFiltering\MyMemoryFilter`
Create instances using the static `getInstance` factory:
```php
` via:
- ``, ``, `` using `dataRef`
- `` using `dataRefStart` and `dataRefEnd`
This library can automatically introduce an `equiv-text` attribute (base64-encoded original value) based on a provided dataRef map, and convert `` pairs to Matecat-compatible `` placeholders for UI consumption. On the way back, it restores the original XLIFF structure.
- Full documentation and examples: docs/dataRef.md
How to provide the map:
- Build an associative array where keys are data ids from `value`.
- Pass that array as the fourth parameter when instantiating the filter.
Example:
```php
'${AMOUNT}',
'source2' => '${RIDER}',
];
$filter = MateCatFilter::getInstance(new FeatureSet(), 'en-US', 'it-IT', $dataRefMap);
// When converting to Layer 2 (UI), the filter will:
// - add equiv-text to // using the map
// - convert ranges to UI placeholders with originalData captured
// When converting back to Layer 1/0, it restores the original XLIFF tags.
```
Note:
- If a dataRef key exists but its value is null or empty, it is treated as the literal string `NULL`.
- If the dataRef map is empty, the component still preserves inline codes by encoding original tags as Matecat placeholders to keep them safe in the UI.
See [docs/dataRef.md](https://github.com/matecat/subfiltering/blob/master/docs/dataRef.md) for concrete before/after string examples and behavior details.
## Basic usage
Once you have a filter instance, use the methods below to convert between layers.
`MateCatFilter` methods:
- `fromLayer0ToLayer2`
- `fromLayer1ToLayer2`
- `fromLayer2ToLayer1`
- `fromLayer2ToLayer0`
- `fromLayer0ToLayer1`
- `fromLayer1ToLayer0`
- `fromRawXliffToLayer0`
- `fromLayer0ToRawXliff`
`MyMemoryFilter` methods:
- `fromLayer0ToLayer1`
- `fromLayer1ToLayer0`
Where:
- Layer 0 = Database
- Layer 1 = External services (MT/TM)
- Layer 2 = Matecat UI
### Example: DB to UI and back (with dataRef map)
```php
'_',
'd2' => '**',
];
$filter = MateCatFilter::getInstance($featureSet, 'en-US', 'it-IT', $dataRefMap);
// Example Layer 0 content holding XLIFF inline codes
$layer0 = "Hi %s .";
// 1) Layer 0 -> Layer 2 (UI)
$ui = $filter->fromLayer0ToLayer2($layer0);
// 'Hi .'
// 2) User edits happen in UI ...
// 3) Layer 2 -> Layer 0 (restore original XLIFF structure)
$backToDb = $filter->fromLayer2ToLayer0($ui);
````
### Example: External service roundtrip
```php
and placeholders.';
// Prepare for MT/TM
$layer1 = $filter->fromLayer0ToLayer1($layer0);
// 'Text with and placeholders.'
// ... send $layer1 to MT/TM and get $translatedLayer1 back ...
// Restore for DB
$layer0Restored = $filter->fromLayer1ToLayer0($layer1);
```
### Injecting custom handlers into the pipeline
Goal Show how to inject only a subset of supported injectable handlers into the transformation pipeline so they run alongside the built-in handlers.
Key points
- Handlers are classes that extend the base handler and implement a transform method.
- You do not manually construct handlers; the pipeline instantiates them and injects the Pipeline instance via setPipeline.
- You inject handlers by passing an array of class names to the filter factory method. Unknown classes are ignored. The sorter normalizes the final execution order.
Example:
```php
".'
$l1 = $filter->fromLayer0ToLayer1($input);
$l2 = $filter->fromLayer0ToLayer2($input);
```
### Disable all injectable handlers by passing null
Example:
```php
fromLayer0ToLayer1($input);
// 'This is <b>bold</b> text.'
$l2_no = $filterNoInjectables->fromLayer0ToLayer2($input);
````
## FeatureSet
You must provide a `FeatureSetInterface` implementation to adjust the pipeline per transformation. A simple, working example lives under the tests/ folder. In your application, implement only the features you need and register them via your FeatureSet.
## Running tests
```shell
bash composer install ./vendor/bin/phpunit
```
## Support
Please open issues and feature requests on GitHub:
https://github.com/matecat/subfiltering/issues
## Authors
- **Domenico Lupinetti** - https://github.com/ostico
- **Mauro Cassani** - https://github.com/mauretto78
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details