https://github.com/ololobus/slavic_text_scht
St. Petersburg corpus of hagiographic texts
https://github.com/ololobus/slavic_text_scht
corpora hagiographic-texts linguistics slavic-languages
Last synced: 2 months ago
JSON representation
St. Petersburg corpus of hagiographic texts
- Host: GitHub
- URL: https://github.com/ololobus/slavic_text_scht
- Owner: ololobus
- Created: 2016-03-17T19:51:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-04-24T22:25:18.000Z (over 9 years ago)
- Last Synced: 2025-04-07T23:42:42.368Z (6 months ago)
- Topics: corpora, hagiographic-texts, linguistics, slavic-languages
- Language: Python
- Size: 2.07 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### St. Petersburg Corpus of Hagiographic Texts
Old Church Slavic corpus
http://project.phil.spbu.ru/scat/page.php?page=project
### Parser
Run to get entire xml text.```
./tei_parser.py xml/Aleksandr_svirskij.xml
```TODO:
* return text sentence by sentence
* return text clause by clause
* keep info about named entities (`` tag)