An open API service indexing awesome lists of open source software.

https://github.com/madrugado/gia-corpus

Corpus of exam tests for 9-graders in Russia for NLP/ML purposes
https://github.com/madrugado/gia-corpus

corpus natural-language-processing nlp russian-corpus

Last synced: 3 months ago
JSON representation

Corpus of exam tests for 9-graders in Russia for NLP/ML purposes

Awesome Lists containing this project

README

        

# gia-corpus
Corpus of exam tests for 9-graders in Russia for NLP/ML purposes

## About contents
The two folders with files: "raw" and "processed". If "processed", so this text is alredy cleaned from the all other text, than questions and answers. In "raw" folder there are text contents of PDFs without post-processing.

We're only using (mostly) first twenty multiple choice questions, since they have only one right answer.

*UPD:* Added tsv folder with files in format compatible to Kaggle Allen AI Challenge (https://www.kaggle.com/c/the-allen-ai-science-challenge/data)

## Format
Subject tag is:
* GE - geography
* OB - social studies
* IS - history

and year is in YYYY format.

For example, IS_2009.processed.txt

## License
Since the contents of this repo is scraped from PDFs from Russian Ministry of Education website and according to it the content of PDF is in public domain, so the license is *CC BY-NC*.