https://github.com/dracor-org/rusdracor
Russian Drama Corpus (in TEI-P5)
https://github.com/dracor-org/rusdracor
corpus digital-humanities drama dramatic-texts tei xml
Last synced: 5 months ago
JSON representation
Russian Drama Corpus (in TEI-P5)
- Host: GitHub
- URL: https://github.com/dracor-org/rusdracor
- Owner: dracor-org
- Created: 2017-09-19T09:28:12.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2025-12-21T02:45:39.000Z (6 months ago)
- Last Synced: 2025-12-22T23:29:38.232Z (6 months ago)
- Topics: corpus, digital-humanities, drama, dramatic-texts, tei, xml
- Language: CSS
- Homepage:
- Size: 28.6 MB
- Stars: 7
- Watchers: 7
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RusDraCor
## Corpus Description
We are building a Russian Drama Corpus with files encoded in
[TEI-P5](http://www.tei-c.org/Guidelines/P5/). Our corpus comprises
**212 plays** to date, originating from [ilibrary](https://ilibrary.ru/),
[Wikisource](https://ru.wikisource.org/), [РВБ](https://rvb.ru/),
[lib.ru](http://lib.ru/), [ФЕБ](http://feb-web.ru/),
[СовЛит](http://www.ruthenia.ru/sovlit/) and
[Wikilivres](https://wikilivres.org/), converted to TEI and corrected
and enhanced by us. There will be more.
If you want to cite the corpus, please use this publication:
- **Fischer, Frank, et al. (2019)**. Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. In *Proceedings of DH2019: "Complexities"*, Utrecht University, [doi:10.5281/zenodo.4284002](https://doi.org/10.5281/zenodo.4284002).
RusDraCor was first presented on June 29, 2017, at the [Corpora 2017
conference](https://events.spbu.ru/events/anons/corpora-2017/?lang=Eng) in St.
Petersburg ([our slides here](https://dlina.github.io/presentations/2017-spb/)),
on July 11, 2017, at the ["Digitizing the stage"
conference](https://digitizingthestage.wordpress.com/) in Oxford and
on November 14, 2017, at the
[TEI 2017 conference](https://hcmc.uvic.ca/tei2017/abstracts/t_115_fischeretal_lifeonstage.html)
in Victoria. The social network data we extract from plays may also be explored
on our website [dracor.org/rus](https://dracor.org/rus) or via
[our Shinyapp](https://shiny.dracor.org/).
If you just want to download the corpus in its current state in XML-TEI,
do this:
`svn export https://github.com/dracor-org/rusdracor/trunk/tei`
## API
An easy way to download the network data (instead of the actual TEI files) is
to use our API ([documentation here](https://dracor.org/doc/api)).
If you have [jq](https://blog.appoptics.com/jq-json/) installed, it would work
like this:
```
for play in `curl 'https://dracor.org/api/corpora/rus' | jq -r ".dramas[] .name"`; do
wget -O "$play".csv https://dracor.org/api/corpora/rus/play/"$play"/networkdata/csv
done
```
The API info page is at `https://dracor.org/api/info`.
## Simple Visualisation with R
To have a first look at the distribution of the number of speakers per play over
time, you could feed the metadata table into R:
```
library(data.table)
library(ggplot2)
rusdracor <- fread("https://dracor.org/api/corpora/rus/metadata.csv")
ggplot(rusdracor[], aes(x = yearNormalized, y = numOfSpeakers)) + geom_point()
```
Result:

Here is a barplot showing the number of plays per decade:

(README last updated on July 26, 2021.)