https://github.com/clarin-eric/vlo-mapping-creator
Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary
https://github.com/clarin-eric/vlo-mapping-creator
Last synced: 5 months ago
JSON representation
Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary
- Host: GitHub
- URL: https://github.com/clarin-eric/vlo-mapping-creator
- Owner: clarin-eric
- License: gpl-3.0
- Created: 2018-01-30T05:28:35.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-04-28T07:18:18.000Z (about 4 years ago)
- Last Synced: 2025-09-10T03:14:31.296Z (9 months ago)
- Language: XSLT
- Size: 39.1 KB
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VLO-mapping-creator
Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary.
## CSV
See [CSV file](src/test/resources/resourceclass-full.csv)
| | A | B | C | D |
| - | ---------------------------------------- | ------------------ | ----- |------------------|
| 1 | resourceclass | resourceclass | genre | TF-Notes |
| 2 | AnnotatedTextCorpus | annotatedText;text | some | |
| 3 | SongsAnthologiesLinguistic corporaCorpus | audioRecording | other | |
| 4 | ~Speech.* | audioRecording | foo | |
| 5 | Spoken Corpus | audioRecording | bar | |
| 6 | OralCorpus | corpus | | |
| 7 | OralCorpus | audioRecording | | |
| 8 | AnthologiesDevotional, "literature" | ! | | skip |
| 9 | foo | | | too be discussed |
- Row 1: column headers referring to facets
- Column A: source
- Column B and higher: targets, each facet should appear only once, unless a column header starts with `TF-` (case insensitive)
- Column B and highter: a column where a header starts with `TF-` (case insensitive) is to be used by the task force (for notes or whatever) and will not be interpreted by the VLO Mapping Creator (see column D)
- Source values (row 2 and higher, column A): if starting with a tilde (`~`) the value is assumed to be a regular expression (see line 4)
- Target values (row 2 and higher, column B and higher): multiple values for one target facet are to be separated by semicolon (`;`) (see line 2)
- Target values (row 2 and higher, column A): if the value is a exclamation mark (`!`) the source value is deleted and not replaced (see line 8)
- Make sure all rows have an equal number of columns!
- Source values are grouped into the mapping XML (see line 6 and 7)
- If no target values/actions are known the row will be skipped (see line 9)
```
"resourceclass","resourceclass","genre","TF-notes"
"AnnotatedTextCorpus","annotatedText;text","some",
"SongsAnthologiesLinguistic corporaCorpus","audioRecording","other",
"~Speech.*","audioRecording","foo",
"Spoken Corpus","audioRecording","bar",
"OralCorpus","corpus",,
OralCorpus,"audioRecording",,
"AnthologiesDevotional, ""literature""",!,,skip
foo,,,to be discussed
```
- Double quote (`“`) in the value can be escaped by doubling (`foo””bar`) (see line 8)
- Double quotes (`“`) are only mandatory if the value contains a comma (`,`) (see line 7 and 8)
## SKOS
The VLO Mapping Creator can merge a SKOS file with a CSV file. The process add mappings from `altLabel`s and `hiddenLabel`s to the `prefLabel`.
### Caveats
- This only works for simple cases at the moment, i.e., the curation of a single facet where the CSV file has as source (column A) the same facet as the target (column B).
- Also regexps are not yet supported.
## Template
In a mapping file the behaviour of intergrating the target value into the target facet can be tweaked, e.g., to overwrite all existing values. The default behaviour for a facet can taken from a template file.
```XML
```
## Command Line
```sh
$ java -jar vlo-mapping-creator.jar -?
INF: java -jar vlo-mapping-creator.jar * , where is one of those:
INF: -s= SKOS file to merge with the CSV
INF: -t= Template file to merge with the Mapping XML
INF: -d Enable debug info
```
### Examples
```sh
$ java -jar vlo-mapping-creator.jar -s src/test/resources/resourceclass.skos -t src/test/resources/default.xml src/test/resources/resourceclass.csv
```
```XML
annotatedText
text
AnnotatedTextCorpus
audioRecording
SongsAnthologiesLinguistic corporaCorpus
audioRecording
SpeechCorpus
audioRecording
Spoken Corpus
corpus
audioRecording
OralCorpus
plainText
AnthologiesDevotional literature
tool
tol
```
```sh
$ java -jar target/vlo-mapping-creator.jar -t src/test/resources/default.xml src/test/resources/resourceclass-full.csv
```
```XML
annotatedText
text
some
AnnotatedTextCorpus
audioRecording
other
SongsAnthologiesLinguistic corporaCorpus
audioRecording
foo
Speech*
audioRecording
bar
Spoken Corpus
corpus
audioRecording
OralCorpus
AnthologiesDevotional, "literature"
```
## TODO
- [X] XSL log messages are not handled correctly yet
- [ ] add tests
- [ ] it needs to be possible to provide a vocabulary specific XSLT to process the SKOS mapping in more advanced cases