Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/global-asp/pb-source
Pratham Books stories in Markdown format
https://github.com/global-asp/pb-source
corpus creative-commons india multilingual storybooks
Last synced: about 1 month ago
JSON representation
Pratham Books stories in Markdown format
- Host: GitHub
- URL: https://github.com/global-asp/pb-source
- Owner: global-asp
- Created: 2016-04-24T21:41:52.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-04-25T06:09:30.000Z (over 8 years ago)
- Last Synced: 2024-09-30T22:42:05.400Z (about 2 months ago)
- Topics: corpus, creative-commons, india, multilingual, storybooks
- Size: 1.98 MB
- Stars: 7
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Source stories from the Pratham Books collection in Markdown format
This repository makes available the source texts of open-licensed stories from [Pratham Books](http://prathambooks.org/) in Markdown format.
Each folder in the repository represents a language, identified by its [ISO 639-1](http://en.wikipedia.org/wiki/ISO_639-1) or [ISO 639-3](http://en.wikipedia.org/wiki/ISO_639-3) code. Source translations into each language are stored in the appropriate folders.
All of these source texts have been extracted from the epub files available on the [Storyweaver](https://storyweaver.org.in) website. The markdown files in this repo provide data for many other projects, for example the translations in the [Global Pratham Books Project](https://github.com/global-asp/global-pb), the [PB Image Bank Explorer](https://github.com/dohliam/pb-imagebank-explorer), as well as making possible the easy creation of bilingual storybooks in any language combination.
Corresponding images for the stories in this repository can be found in the [Pratham Books Image Bank](https://github.com/global-asp/pb-imagebank).
## Format
The extracted source text of all stories has been provided here in Markdown format. See [here](https://github.com/global-asp/global-asp#source-format) for specific details about the format used.
A sequence of two hashes `##` on a separate line indicates a page break.
Editing of the story content has been kept to a minimum and for the most part the stories are presented as they are. Corrections other than obvious errors of orthography or traces of the conversion process should be directed to Pratham Books through the [Storyweaver](https://storyweaver.org.in) website directly.
## Languages
Pratham Books currently provides stories in 35 different languages. This repository attempts to provide the source text for all of these stories in machine- and human-readable Markdown format.
Below is a key to the languages covered by this repository and their ISO 639-1/3 codes.
ISO code | Language Name
-------- | -------------
[as](https://github.com/global-asp/pb-source/tree/master/as) | Assamese
[bn](https://github.com/global-asp/pb-source/tree/master/bn) | Bengali
[en](https://github.com/global-asp/pb-source/tree/master/en) | English
[gu](https://github.com/global-asp/pb-source/tree/master/gu) | Gujarati
[hi](https://github.com/global-asp/pb-source/tree/master/hi) | Hindi
[kn](https://github.com/global-asp/pb-source/tree/master/kn) | Kannada
[kok](https://github.com/global-asp/pb-source/tree/master/kok) | Konkani
[kru](https://github.com/global-asp/pb-source/tree/master/kru) | Kurukh
[ml](https://github.com/global-asp/pb-source/tree/master/ml) | Malayalam
[mqu](https://github.com/global-asp/pb-source/tree/master/mqu) | Mundari
[mr](https://github.com/global-asp/pb-source/tree/master/mr) | Marathi
[or](https://github.com/global-asp/pb-source/tree/master/or) | Oriya
[pa](https://github.com/global-asp/pb-source/tree/master/pa) | Punjabi
[sa](https://github.com/global-asp/pb-source/tree/master/sa) | Sanskrit
[sck](https://github.com/global-asp/pb-source/tree/master/sck) | Sadri
[ta](https://github.com/global-asp/pb-source/tree/master/ta) | Tamil
[te](https://github.com/global-asp/pb-source/tree/master/te) | Telugu## License
All stories in this repository are [Creative Commons](https://creativecommons.org/) licensed (CC-BY 4.0) with the exception of several stories that are Public Domain. The specific license for each story is indicated both in the metadata section at the bottom of each file, as well as in the corresponding `README.md` file for that language. Direct links to the original stories on the Pratham Books website can also be found in the `README.md` files.