An open API service indexing awesome lists of open source software.

https://github.com/clarin-eric/vlo-mapping-creator

Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary
https://github.com/clarin-eric/vlo-mapping-creator

Last synced: 5 months ago
JSON representation

Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary

Awesome Lists containing this project

README

          

# VLO-mapping-creator
Tool to create a VLO mapping file based on a CSV and optionally a CLAVAS vocabulary.

## CSV

See [CSV file](src/test/resources/resourceclass-full.csv)

| | A | B | C | D |
| - | ---------------------------------------- | ------------------ | ----- |------------------|
| 1 | resourceclass | resourceclass | genre | TF-Notes |
| 2 | AnnotatedTextCorpus | annotatedText;text | some | |
| 3 | SongsAnthologiesLinguistic corporaCorpus | audioRecording | other | |
| 4 | ~Speech.* | audioRecording | foo | |
| 5 | Spoken Corpus | audioRecording | bar | |
| 6 | OralCorpus | corpus | | |
| 7 | OralCorpus | audioRecording | | |
| 8 | AnthologiesDevotional, "literature" | ! | | skip |
| 9 | foo | | | too be discussed |

- Row 1: column headers referring to facets
- Column A: source
- Column B and higher: targets, each facet should appear only once, unless a column header starts with `TF-` (case insensitive)
- Column B and highter: a column where a header starts with `TF-` (case insensitive) is to be used by the task force (for notes or whatever) and will not be interpreted by the VLO Mapping Creator (see column D)
- Source values (row 2 and higher, column A): if starting with a tilde (`~`) the value is assumed to be a regular expression (see line 4)
- Target values (row 2 and higher, column B and higher): multiple values for one target facet are to be separated by semicolon (`;`) (see line 2)
- Target values (row 2 and higher, column A): if the value is a exclamation mark (`!`) the source value is deleted and not replaced (see line 8)
- Make sure all rows have an equal number of columns!
- Source values are grouped into the mapping XML (see line 6 and 7)
- If no target values/actions are known the row will be skipped (see line 9)

```
"resourceclass","resourceclass","genre","TF-notes"
"AnnotatedTextCorpus","annotatedText;text","some",
"SongsAnthologiesLinguistic corporaCorpus","audioRecording","other",
"~Speech.*","audioRecording","foo",
"Spoken Corpus","audioRecording","bar",
"OralCorpus","corpus",,
OralCorpus,"audioRecording",,
"AnthologiesDevotional, ""literature""",!,,skip
foo,,,to be discussed
```

- Double quote (`“`) in the value can be escaped by doubling (`foo””bar`) (see line 8)
- Double quotes (`“`) are only mandatory if the value contains a comma (`,`) (see line 7 and 8)

## SKOS

The VLO Mapping Creator can merge a SKOS file with a CSV file. The process add mappings from `altLabel`s and `hiddenLabel`s to the `prefLabel`.

### Caveats

- This only works for simple cases at the moment, i.e., the curation of a single facet where the CSV file has as source (column A) the same facet as the target (column B).
- Also regexps are not yet supported.

## Template

In a mapping file the behaviour of intergrating the target value into the target facet can be tweaked, e.g., to overwrite all existing values. The default behaviour for a facet can taken from a template file.

```XML

```

## Command Line

```sh
$ java -jar vlo-mapping-creator.jar -?
INF: java -jar vlo-mapping-creator.jar * , where is one of those:
INF: -s= SKOS file to merge with the CSV
INF: -t= Template file to merge with the Mapping XML
INF: -d Enable debug info
```

### Examples

```sh
$ java -jar vlo-mapping-creator.jar -s src/test/resources/resourceclass.skos -t src/test/resources/default.xml src/test/resources/resourceclass.csv
```
```XML





annotatedText
text
AnnotatedTextCorpus


audioRecording
SongsAnthologiesLinguistic corporaCorpus


audioRecording
SpeechCorpus


audioRecording
Spoken Corpus


corpus
audioRecording
OralCorpus


plainText
AnthologiesDevotional literature


tool
tol


```

```sh
$ java -jar target/vlo-mapping-creator.jar -t src/test/resources/default.xml src/test/resources/resourceclass-full.csv
```
```XML





annotatedText
text
some
AnnotatedTextCorpus


audioRecording
other
SongsAnthologiesLinguistic corporaCorpus


audioRecording
foo
Speech*


audioRecording
bar
Spoken Corpus


corpus
audioRecording
OralCorpus



AnthologiesDevotional, "literature"


```

## TODO

- [X] XSL log messages are not handled correctly yet
- [ ] add tests
- [ ] it needs to be possible to provide a vocabulary specific XSLT to process the SKOS mapping in more advanced cases