https://github.com/divefish/sorts
A Subject-Object Resolution Test Suite of German minimal sentence pairs for morpho-syntactic and semantic model introspection
https://github.com/divefish/sorts
Last synced: 9 months ago
JSON representation
A Subject-Object Resolution Test Suite of German minimal sentence pairs for morpho-syntactic and semantic model introspection
- Host: GitHub
- URL: https://github.com/divefish/sorts
- Owner: DiveFish
- Created: 2020-06-26T09:20:12.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2025-10-04T12:46:38.000Z (9 months ago)
- Last Synced: 2025-10-04T14:36:44.325Z (9 months ago)
- Language: Python
- Size: 25.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SORTS
A Subject-Object Resolution Test Suite of German and Dutch minimal sentence pairs for morpho-syntactic and semantic model introspection, described in detail in the [SORTS paper](https://www.aclweb.org/anthology/2020.coling-main.269.pdf) (Fischer et al., _When Beards Start Shaving Men: A Subject-object Resolution Test Suite for Morpho-syntactic and Semantic Model Introspection_, COLING 2020).
The SORTS test suite consists of monotransitive clauses (18,502 German; 14,670 Dutch), annotated with the following property classes:
|Property | Annotation |
|:------------- |:-------------|
|Base case|`acc`|
|**_1. Word order_**|
|Verb-first, subject-object|`LK[V]MF[SO]`|
|Verb-second, subject-object|`VF[S]LK[V]MF[O]`|
|Verb-second, subject-object|`VF[ADV]LK[V]MF[SO]`|
|Verb-last, subject-object|`MF[SO]VC[V]`|
|Verb-first, object-subject|`LK[V]MF[OS]`|
|Verb-second, object-subject|`VF[O]LK[V]MF[S]`|
|Verb-second, object-subject|`VF[ADV]LK[V]MF[OS]`|
|Verb-last, object-subject|`MF[OS]VC[V]`|
|**_2. Morphology/syntax_**|
|Dative object|`dat`|
|Subject-object case syncretism|`amb`|
|Pronoun subject|`spron`|
|Pronoun object|`opron`|
|Negated object|`oneg`|
|Auxiliary verb|`aux`|
|Prepositional phrase|`pp`|
|**_3. Semantics_**|
|Inanimate subject|`sinan`|
|Animate object|`oan`|
|Inverted animacy|`invan`|
|Regular polysemy|`regpol`|
|Proper name subject|`sname`|
|Semantic asymmetry|`semas`|
|Non-referential object|`noref`|
|Psych verb with experiencer object|`psy`|
|Light verb construction|`vlight`|
|Synonymous verb|`syn`|
|Proverb|`prov`|
## Sentence variations
Sentences with particular properties can be extracted from the test suite using the full word order and property annotations as in this example:
|Variations | Property | Full annotation |
|:------------- |:-------------|:-------------|
|0| Base sentence, e.g. SVO order |`order:VF[S]LK[V]MF[O]\|props:base-acc`|
|1| Base sentence, e.g. SVO order with auxiliary verb|`order:VF[S]LK[V]MF[O]\|props:base-aux`|
|2| Base sentence, e.g. SVO order with auxiliary verb and synonym of main verb |`order:VF[S]LK[V]MF[O]\|props:aux-syn`|
### Additional annotations
[Gold standard](https://github.com/DiveFish/SORTS/tree/master/gold)
- Subject head index and label
- Object head index and label
[Annotated](https://github.com/DiveFish/SORTS/tree/master/annotated) (automatically annotated using [sticker2](https://github.com/stickeritis/sticker2))
- Lemmas, part of speech and topological fields (manually corrected)
- Morphological information
### Test suite subsets
- `part-ambiguous_gold`: only the `amb` variant displays case syncretism between subject and object
- `ambiguous_gold`: all sentences display case syncretism between subject and object; no `dat` and `amb` variants
### Utilities
Data processing utilities, e.g. to convert between different formats, are available in the repository of the PP test suite [PPATS](https://github.com/DiveFish/PPATS).
### Looking for a PP test suite?
Then check out the German [PPATS PP attachment test suite](https://github.com/DiveFish/PPATS) - more brain teasers for NLP systems! :tada: