https://github.com/writecrow/tag_converter
Convert files tagged with corpus metadata to JSON, PHP, or XML.
https://github.com/writecrow/tag_converter
corpus php-library
Last synced: about 1 month ago
JSON representation
Convert files tagged with corpus metadata to JSON, PHP, or XML.
- Host: GitHub
- URL: https://github.com/writecrow/tag_converter
- Owner: writecrow
- License: mit
- Created: 2017-05-13T15:44:34.000Z (about 9 years ago)
- Default Branch: main
- Last Pushed: 2021-08-01T20:33:25.000Z (almost 5 years ago)
- Last Synced: 2025-08-23T19:25:45.605Z (9 months ago)
- Topics: corpus, php-library
- Language: PHP
- Homepage: https://tag-converter.markfullmer.com
- Size: 494 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Corpus-tagged Text Converter
[](https://circleci.com/gh/writecrow/tag_converter)
A PHP library for converting files tagged with corpus metadata to JSON, PHP,
or XML.

## History
Corpus linguistics researchers use a markup-like syntax to provide metadata
about texts. For consumption by applications, this syntax needs to be converted
into a more universal, machine-readable format. The format chosen was JSON.
## Basic Usage
The included `/demo/index.php` file contains a conversion form demonstration.
Make your code aware of the TagConverter class via your favorite method (e.g.,
`use` or `require`)
Then pass a string of text into the class:
```php
$text = TagConverter::json('My tagged text here');
echo $text;
// Returns {"MyTag":"123","text":"My tagged text here"}
$text = TagConverter::php('My tagged text here');
echo $text;
// Returns array('MyTag' => '123', 'text' => 'My tagged text here')
$text = TagConverter::xml('My tagged text here');
echo $text;
// Returns 123My tagged text here
```
## Expected input format
The corpus style tagging syntax expected by the library is defined as follows:
1. Tags must be wrapped in ```<``` and ```>```
2. Tag names and tag values may only alphanumeric characters, spaces,
underscores, and hypens.
3. Tag names must be separated from tag values by a ```:```
4. Spaces at the beginning at end of tag names or tag values are ignored;
spaces within tag values will be preserved
5. Everything not wrapped in ```<``` and ```>``` will be considered "text"
| Status | Tag Example | Explanation
| --- | --- | --- |
| Good | `````` | |
| Good | `````` | Spaces in tag names & values OK |
| Good | ```< My Tag : Some Text >``` | Spaces padding tag names & values OK|
| Good | ```< My-Tag : Some_Text >``` | Underscores & hyphens OK|
| Good | ```< My-Tag : Value 1 | Value 2 >``` | Pipe separators for multiple values|
| Good | ```< My-Tag : Value 1 ; Value 2 >``` | Semicolon separators for multiple values|
| Bad | ```< My/Tag : Some:Text >``` | Other characters not OK|
## Testing
Unit Tests can be run (after ```composer install```) by executing ```vendor/bin/phpunit```