{"id":15494808,"url":"https://github.com/dwhieb/nuuchahnulth","last_synced_at":"2025-04-06T15:29:41.129Z","repository":{"id":47318132,"uuid":"181127642","full_name":"dwhieb/Nuuchahnulth","owner":"dwhieb","description":"Linguistic data on the Nuuchahnulth (Wakashan) language","archived":false,"fork":false,"pushed_at":"2021-09-04T01:29:31.000Z","size":7406,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-19T12:15:08.137Z","etag":null,"topics":["corpora","corpus","corpus-linguistics","documentary-linguistics","language-documentation","linguistics","nuuchahnulth","wakashan"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dwhieb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null}},"created_at":"2019-04-13T05:55:33.000Z","updated_at":"2021-09-06T11:50:29.000Z","dependencies_parsed_at":"2022-08-24T05:10:27.858Z","dependency_job_id":null,"html_url":"https://github.com/dwhieb/Nuuchahnulth","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwhieb%2FNuuchahnulth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwhieb%2FNuuchahnulth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwhieb%2FNuuchahnulth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwhieb%2FNuuchahnulth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dwhieb","download_url":"https://codeload.github.com/dwhieb/Nuuchahnulth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247502685,"owners_count":20949308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corpora","corpus","corpus-linguistics","documentary-linguistics","language-documentation","linguistics","nuuchahnulth","wakashan"],"created_at":"2024-10-02T08:15:19.225Z","updated_at":"2025-04-06T15:29:41.110Z","avatar_url":"https://github.com/dwhieb.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nuuchahnulth\n\nThis repository contains linguistic texts in Nuuchahnulth, a language of the Wakashan language family, spoken in the Pacific Northwest. These texts are digitally-searchable versions of those prepared by Toshihide Nakayama (Tokyo University of Foreign Studies), and published as volumes A2-027 and A2-028 of the series _Endangered Languages of the Pacific Rim_. The texts were dictated by George Louie and Caroline Little to Toshihide Nakayama, who then transcribed, analyzed, and prepared the edited versions.\n\n## Contents\n\n\u003c!-- TOC --\u003e\n\n- [Attribution](#attribution)\n- [Reporting Typos \u0026 Issues](#reporting-typos--issues)\n- [Corpus Statistics](#corpus-statistics)\n- [Text Formats](#text-formats)\n- [Sounds of Nuuchahnulth](#sounds-of-nuuchahnulth)\n- [Abbreviations](#abbreviations)\n- [Converting the Corpus](#converting-the-corpus)\n\n\u003c!-- /TOC --\u003e\n\n## Attribution\n\nIf you would like to use the data in the repository for research, please cite the following sources, depending on the text:\n\n* Nakayama, Toshihide (ed.). 2003. _Caroline Little's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis_ (Endangered Languages of the Pacific Rim A2-027). Kyoto: Nakanishi Printing Co.\n\n* Nakayama, Toshihide (ed.). 2003. _George Louie's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis_ (Endangered Languages of the Pacific Rim A2-028). Kyoto: Nakanishi Printing Co.\n\nYou may also use the stable DOI made available through Zenodo to cite this online version of the corpus:\n\n**DOI:**[10.5281/zenodo.3931864](http://doi.org/10.5281/zenodo.3931864)\n\n[![DOI](https://zenodo.org/badge/181127642.svg)](https://zenodo.org/badge/latestdoi/181127642)\n\nFor other uses of this data, please contact [Toshihide Nakayama](mailto:nakayama@aa.tufs.ac.jp).\n\n## Reporting Typos \u0026 Issues\n\nTo report a typo or other problem, [open an issue on GitHub][new-issue].\n\n## Corpus Statistics\n\nStatistic  | Value\n-----------|------\nSpeakers   | 2\nTexts      | 24\nUtterances | 2,081\nTokens     | 8,366\nWordforms  | 4,216\nStems      | 2,547\nRoots      | 1,313\n\n## Text Formats\n\nThe texts are available in three formats:\n\n* The \"raw\" versions of the texts, in a practical writing system used for the purpose of quickly typing in the data. These versions are used to produce the other versions of the texts. These versions are located in the folder `texts/raw`.\n\n* An [\u003cdfn\u003einterlinear gloss format\u003c/dfn\u003e][IGL] (\u003cabbr title='interlinear gloss'\u003eIGL\u003c/abbr\u003e) — a format used by linguists to represent data in a way that can be read and understood by anyone. Each document itself follows a format called [scription][scription], which enforces consistency in the structure of the text, making it computationally parseable. These versions are located in the folder `texts/interlinear`.\n\n  At the top of each text is a header (between the two sets of dashes `---`), which provides the title in English (and sometimes Nuuchahnulth), the abbreviation, and the unique ID for each text.\n\n  Beneath the header are utterances (sentences) in the text. Each utterance is separated from the next by a blank line.\n\n  Each utterance has 5 lines, which contain the following kinds of information:\n\n  1. **Utterance Number:** The number of the utterance within the text.\n  1. **Transcript:** A transcription of each utterance using the Nuuchahnulth writing system, along with punctuation.\n  1. **Morphemes:** A list of each \u003cdfn\u003emorpheme\u003c/dfn\u003e (meaningful part) of each word, where morphemes are separated by hyphens.\n  1. **Glosses:** A short \u003cdfn\u003egloss\u003c/dfn\u003e (abbreviation) indicating the meaning of each morpheme in the word, separated by hyphens. See the [Abbreviations](#abbreviations) section below.\n  1. **Literal Translations:** Literal translations of each word.\n  1. **Free Translations:** A free (loose) translation for the utterance.\n\n  For more information about the scription format, visit [https://scription.digitallinguistics.io][scription].\n\n* A [JSON][JSON] version, formatted according to the [Data Format for Digital Linguistics][DaFoDiL] (\u003cabbr\u003eDaFoDiL\u003c/abbr\u003e). This version of the corpus is most useful for programmatically interacting with the texts. See the [DaFoDiL page][DaFoDiL] for more information about how this data is formatted.\n\n## Sounds of Nuuchahnulth\n\nThe following table shows the consonant sounds of Nuuchahnulth, arranged by place and manner of articulation in accordance with the [\u003cdfn\u003eInternational Phonetic Alphabet\u003c/dfn\u003e][IPA] (\u003cabbr title='International Phonetic Alphabet'\u003eIPA\u003c/abbr\u003e).\n\nManner            | Labial | Apical | Alveolar | Lateral | Palatal | Velar | Labio-Velar | Uvular | Labio-Uvular | Pharyngeal | Glottal\n------------------|:------:|:------:|:--------:|:-------:|:-------:|:-----:|:-----------:|:------:|:------------:|:----------:|:------:\nStops             |   p    |   t    |    c     |    ƛ    |    č    |   k   |     kʷ      |   q    |      qʷ      |     ʕ      |    ʔ\nEjectives         |   p̓    |   t̓    |    c̓     |    ƛ̓    |    č̓    |   k̓   |     k̓ʷ      |        |     (q̓ʷ)     |            |\nFricatives        |        |        |    s     |    ɬ    |    š    |   x   |     xʷ      |   x̣    |              |     ḥ      |    h\nResonants         |   m    |   n    |          |         |    y    |       |      w      |        |              |            |\nGlottal Resonants |   m̓    |   n̓    |          |         |    y̓    |       |      w̓      |        |              |            |\n\nAhousaht Nuuchahnulth has three vowels: /i, a, u/, each of which may be long (/Vː/), short, or variable-length (/V·/).\n\nCertain suffixes in Nuuchahnulth change the sounds that precede them:\n\n* \u003cdfn\u003eHardening suffixes\u003c/dfn\u003e change stops, affricates, and resonants into their glottalized counterparts, and fricatives into /w̓/ or /y̓/ depending on whether the consonant is rounded. Hardening suffixes are indicated by ⟨ʼ⟩.\n* \u003cdfn\u003eSoftening suffixes\u003c/dfn\u003e change a preceding fricative into /w/ or /y/ depending on whether the consonant is rounded. Softening suffixes are indicated by ⟨ʽ⟩.\n\n## Abbreviations\n\nThe following abbreviations are used in the texts.\n\nAbbreviation | Meaning\n-------------|-------------------------------------\nCAUS         | causative\nCOND         | conditional mood\nCONT         | continuative aspect\nDEF          | definite\nDIM          | diminutive\nDISTR        | distributive\nDUB          | dubitative mood\nDUP          | CV reduplication\nDUP#         | syllable reduplication\nDUPCV        | CV reduplication\nDUR          | durative aspect\nEXP          | expression that cannot be translated\nFIN(ITE)     | finite event\nFUT          | future\nFUT.IMP      | future imperative\nGRAD         | graduative aspect\nIMP          | imperative\nINC          | inceptive aspect\nINC.CAUS     | inceptive causative\nIND          | indicative mood\nINDF         | indefinite mood\nINF          | inferential mood\nINTER        | interrogative\nINTJ         | interjection\nIT           | iterative aspect\nIT.INC       | iterative inceptive aspect\nIT.PL        | iterative plural\nLOC          | location\nMOM          | momentaneous\nMOMCAUS      | momentaneous causative\nPL           | plural\nPOSS         | possessive\nPURP         | purposive\nQUOT         | quotative\nREL          | relative mood\nREL.DUB      | relative dubitative mood\nREP          | repetitive aspect\nSG           | singular\nSHIFT        | perspective shifting\nSIM          | simultaneous (‘while doing…’)\nSPOR         | sporadic aspect\nSUB          | subordinate mood\n\n## Converting the Corpus\n\nTo run the scripts that convert the corpus for yourself, you will need to 1) install [Node.js][Node], 2) [clone this repository][clone] to your computer, 3) install the necessary scripts by running `npm install` from the command line in the folder for the repository, and 4) then run the command `npm build` from the command line in the folder for this repository.\n\nYou can also run just the transliteration step (`npm run transliterate`) or the conversion step (`npm run convert`).\n\n## Find \u0026 Replace\n\nI've also written a find-and-replace script ([`scripts/findAndReplace.js`](findAndReplace)), which allows the user to run searches on the corpus or update the JSON files in the corpus. See the documentation on how to use this function in the [`findAndReplace.js`][findAndReplace] file. An example of how to use this function can be seen in [`scripts/getCorpusStats.js`][getCorpusStats], which calculates the statistics for the corpus.\n\n[clone]:          https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository\n[DaFoDiL]:        https://format.digitallinguistics.io\n[DLx]:            https://digitallinguistics.io\n[findAndReplace]: https://github.com/dwhieb/Nuuchahnulth/blob/master/scripts/findAndReplace.js\n[getCorpusStats]: https://github.com/dwhieb/Nuuchahnulth/blob/master/scripts/getCorpusStats.js\n[IGL]:            https://www.eva.mpg.de/lingua/resources/glossing-rules.php\n[IPA]:            https://www.internationalphoneticalphabet.org/\n[JSON]:           https://en.wikipedia.org/wiki/JSON#Example\n[new-issue]:      https://github.com/dwhieb/Nuuchahnulth/issues/new\n[Node]:           https://nodejs.org/en/\n[scription]:      https://scription.digitallinguistics.io\n[transliterate]:  https://developer.digitallinguistics.io/transliterate/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwhieb%2Fnuuchahnulth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdwhieb%2Fnuuchahnulth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwhieb%2Fnuuchahnulth/lists"}