An open API service indexing awesome lists of open source software.

https://github.com/cltk/middle_high_german_texts


https://github.com/cltk/middle_high_german_texts

middle-high-german mittelhochdeutsch

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# Referenzkorpus Mittelhochdeutsch

## Source and license
[Main page of the project](https://www.linguistics.rub.de/rem/access/index.html)

License:

> Das Referenzkorpus Mittelhochdeutsch ist lizenziert unter einer [Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Lizenz](https://creativecommons.org/licenses/by-sa/4.0/).

No change is made on the corpus. This code is intended to parse the corpus.

## Corpus retrieval

1. Go to https://www.linguistics.rub.de/rem/access/index.html.
2. Click on "CORA-XML AKS .TAR.XZ" or "CORA-XML ALS .ZIP"
3. Click on "Herunterladen".
4. Uncompress the dowloaded file.
5. You have a folder, named **rem-corraled-20161222** (2019-09-18) with a list of XML files which are annotated texts.

## Code
The available code will parse individual XML files.