https://github.com/hughp/dnj-corpus
A small corpus of a local newspaper
https://github.com/hughp/dnj-corpus
Last synced: about 1 month ago
JSON representation
A small corpus of a local newspaper
- Host: GitHub
- URL: https://github.com/hughp/dnj-corpus
- Owner: HughP
- License: other
- Created: 2018-05-26T19:11:48.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-11-13T00:24:12.000Z (over 6 years ago)
- Last Synced: 2025-02-15T06:26:31.799Z (3 months ago)
- Language: HTML
- Size: 62.9 MB
- Stars: 3
- Watchers: 4
- Forks: 2
- Open Issues: 9
-
Metadata Files:
- Readme: Readme.md
- Support: Support-Files/Cool code snips.md
Awesome Lists containing this project
README
# Eastern Dan [dnj] corpus and text input analysis
## Sections
### Corpora in Languages
* A medium size corpus in Eastern Dan [dnj]
* The Eastern Dan corpus consists of issues of a local newspaper (_˗Pamɛbhamɛ_), and medical counsels (chapters) from _While waiting for a medical doctor_ translated into Eastern Dan. A detailed description of the corpus is available [here](/Writing-System-Descriptions/Eastern-Dan/ReadMe.md).
* A parallel corpus (the New Testament book of James) in following languages: Spanish [spa], French [fra], English [eng], Eastern Dan [dnj], Me'phaa [tcf], Mandarin [cmn], plus two Nigerian languages [bkv], and [eza].### Writing System Descriptions
[`/Writing-System-Descriptions`](/Writing-System-Descriptions) The focus here is on the Eastern Dan writing system in general but some description also exists for the other languages in the parallel corpus. In the process of attempting to describe Eastern Dan's text input options, this corpus description also attempts a modest application of the principles set out by Martin Hosken for _Writing System Descriptions_. A specific analysis of the text input options for Eastern Dan also exists under the _Text Input Systems and their Descriptions_ section.### Text Input Systems and their Descriptions
[`/Text-Input-Systems-and-Descriptions`](/Text-Input-Systems-and-Descriptions) Information about several keyboard layouts exists as part of this corpus.
* [`/AFU`](/Text-Input-Systems-and-Descriptions/AFU)
* [`/Trans-Mande`](/Text-Input-Systems-and-Descriptions/Trans-Mande)
* [`/AZERTY`](/Text-Input-Systems-and-Descriptions/AZERTY)
* [`/QWERTY`](/Text-Input-Systems-and-Descriptions/QWERTY)
* [`/BÉPO`](/Text-Input-Systems-and-Descriptions/BÉPO)
* [`/Spanish ISO`](/Text-Input-Systems-and-Descriptions/Spanish)
* [`/Me'phaa`](/Text-Input-Systems-and-Descriptions/Me'phaa)### Tools
A variety of tools were used in this project. I try to keep a list and a copy of them in this project.
#### Tools for Text
[`/Tools/Tools-for-Text`](/Tools/Tools-for-Text) include:
* Measuring diacritic density and tone melodies
* Text character counting
* Converting or Transforming Text#### Tools for Keyboard Layout analysis
* Typing
* KLA#### Tools for Descriptions
* KLE