{"id":20778036,"url":"https://github.com/fergusq/yajwiz","last_synced_at":"2025-10-10T08:39:36.225Z","repository":{"id":53402782,"uuid":"304141954","full_name":"fergusq/yajwiz","owner":"fergusq","description":"Klingon morphological analyzer and other NLP tools","archived":false,"fork":false,"pushed_at":"2024-04-21T18:26:31.000Z","size":8389,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-28T15:15:53.946Z","etag":null,"topics":["klingon","morphological-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fergusq.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-14T21:40:25.000Z","updated_at":"2024-04-21T18:26:35.000Z","dependencies_parsed_at":"2024-04-21T20:41:03.730Z","dependency_job_id":null,"html_url":"https://github.com/fergusq/yajwiz","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fergusq/yajwiz","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fergusq%2Fyajwiz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fergusq%2Fyajwiz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fergusq%2Fyajwiz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fergusq%2Fyajwiz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fergusq","download_url":"https://codeload.github.com/fergusq/yajwiz/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fergusq%2Fyajwiz/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003276,"owners_count":26083555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["klingon","morphological-analysis"],"created_at":"2024-11-17T13:18:42.570Z","updated_at":"2025-10-10T08:39:36.199Z","avatar_url":"https://github.com/fergusq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"yajwI'\n======\n\n**yajwI'** is a Klingon NLP toolkit that includes basic tokenization, morphological analysis and POS tagging.\n\nIt heavily uses the `boQwI' dictionary \u003chttps://github.com/De7vID/klingon-assistant-data\u003e`_.\n\nInstallation\n------------\n\nyajwI' requires Python 3.8 or newer.\n\nIt can be installed from PyPI::\n\n    pip install yajwiz\n\nUpdating and using the boQwI' dictionary\n----------------------------------------\n\nWhen yajwI' is first imported, it will download a copy of the boQwI' dictionary.\nAfter this the ``update_dictionary()`` function must be called whenever the dictionary needs to be updated.\nThe function will check for updates and install them.\n\nThe downloaded dictionary can be accessed through the ``load_dictionary()`` function.\n\n\u003e\u003e\u003e import yajwiz\n\u003e\u003e\u003e yajwiz.update_dictionary()\n\u003e\u003e\u003e dictionary = yajwiz.load_dictionary()\n\u003e\u003e\u003e dictionary.version\n'2021.03.18a'\n\nTokenization\n------------\n\nThe library includes very simple tokenization.\n\n\u003e\u003e\u003e import yajwiz\n\u003e\u003e\u003e yajwiz.tokenize(\"Hegh neH chav qoH. qanchoHpa' qoH, Hegh qoH.\")\n[('WORD', 'Hegh'), ('SPACE', ' '), ('WORD', 'neH'), ('SPACE', ' '), ('WORD', 'chav'), ('SPACE', ' '), ('WORD', 'qoH'), ('PUNCT', '.'), ('SPACE', ' '), ('WORD', \"qanchoHpa'\"), ('SPACE', ' '), ('WORD', 'qoH'), ('PUNCT', ','), ('SPACE', ' '), ('WORD', 'Hegh'), ('SPACE', ' '), ('WORD', 'qoH'), ('PUNCT', '.')]\n\n\nMorphological analysis\n----------------------\n\nThe ``yajwiz.analyze`` function parses a word and returns a list of possible parses and a lot of extra information.\n\n\u003e\u003e\u003e yajwiz.analyze(\"yInwI'\")\n[{'BOQWIZ_ID': 'yIn:n',\n  'BOQWIZ_POS': 'n:klcp1',\n  'LEMMA': 'yIn',\n  'PARTS': ['yIn:n', \"-wI':n\"],\n  'POS': 'N',\n  'SUFFIX': {'N4': \"-wI'\"},\n  'UNGRAMMATICAL': 'ILLEGAL PLURAL OR POSSESSIVE SUFFIX',\n  'WORD': \"yInwI'\",\n  'XPOS': 'N',\n  'XPOS_GSUFF': 'N'},\n {'BOQWIZ_ID': 'yIn:v',\n  'BOQWIZ_POS': 'v:t_c,klcp1',\n  'LEMMA': 'yIn',\n  'PARTS': ['yIn:v', \"-wI':v\"],\n  'POS': 'V',\n  'SUFFIX': {'V9': \"-wI'\"},\n  'WORD': \"yInwI'\",\n  'XPOS': 'VT',\n  'XPOS_GSUFF': \"VT.wI'\"}]\n\nCurrently the analyzer is very permissive and does allow using wrong plurals and possessive suffixes (eg. **yInwI'** instead of **yInwIj**). It will try to mark this kind of errors with ``'UNGRAMMATICAL': True``. It detects the following errors:\n\n- Using **-pu'**, **-wI'**, **-lI'**, etc. when the noun is not a person noun\n- Using **-Du'** when the noun is not a body part\n- Using **-vIS** without using **-taH**\n- Using **-lu'** with an illegal verb prefix\n- Using intransitive verbs with prefixes indicating object\n- Using **-ghach** without any other verb suffix\n- Using aspect suffix with **-jaj**\n\nThere is also a simpler function ``yajwiz.split_to_morphemes``, that returns a set of tuples of strings (usually there will be only one tuple in the set):\n\n\u003e\u003e\u003e yajwiz.split_to_morphemes(\"yInwI'\")\n{('yIn', \"-wI'\")}\n\nList of Parts of Speech\n.......................\n\n===== ===========\nXPOS  Explanation\n===== ===========\nVS    Stative verb\nVT    Transitive verb\nVI    Intransitive verb\nVA    Transitive and intransitive verb\nV?    Verb with unknown transitivity\nNL    Person noun\nNB    Body part noun\nPRON  Pronoun (including **'Iv** and **nuq**: it is a noun that can function as a copula)\nNUM   Number\nN     Other noun\nADV   Adverb\nEXCL  Exclamation\nCONJ  Conjunction\nQUES  Question word (other than **'Iv** and **nuq**)\nUNK   Unknown\n===== ===========\n\nGrammar checker\n---------------\n\nyajwI' can be used to find common grammar errors. You can either use the method ``yajwiz.grammar_check`` or the following command line interface:\n\n.. code::\n\n    python -m yajwiz.grammar_check file.txt\n\nCONLL-U files and POS tagger\n----------------------------\n\nCONLL-U files are a popular data format for storing annotated linguistic data.\n\nyajwI' can generate CONLL-U files filled with morphological information (it does not support dependency parsing).\n\nBelow is an example script that first parses a text without a trained POS tagger,\nthen trains a POS tagger with it and finally parses the text with the tagger and saves the result to a CONLL-U file.\n\n.. code:: python\n\n    import yajwiz\n\n    with open(\"prose-corpus.txt\", \"r\") as f:\n        text = f.read()\n\n    conllu = yajwiz.text_to_conllu(text)\n\n    tagger = yajwiz.Tagger()\n    tagger.train(yajwiz.conllu_to_tagged_list(conllu))\n\n    conllu = yajwiz.text_to_conllu(text, tagger)\n\n    with open(\"prose-corpus.conllu\", \"w\") as f:\n        f.write(conllu)\n\nWithout a trained POS tagger, ambiguous words will be left without a tag:\n\n.. code::\n\n    # Hegh neH chav qoH.\n    1\tHegh\t_\t_\t_\t_\t_\t_\t_\t_\n    2\tneH\t_\t_\t_\t_\t_\t_\t_\t_\n    3\tchav\t_\t_\t_\t_\t_\t_\t_\t_\n    4\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    5\t.\t.\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n\n    # qanchoHpa' qoH, Hegh qoH.\n    1\tqanchoHpa'\tqan\tVERB\tV?.pa'\tPerson=3|ObjPerson=3,0\t_\t_\t_\tSuffixV3=-choH|SuffixV9=-pa'\n    2\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    3\t,\t,\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n    4\tHegh\t_\t_\t_\t_\t_\t_\t_\t_\n    5\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    6\t.\t.\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n\nAfter training the tagger, it will take the \"best guess\" when deciding the POS.\n\n.. code::\n\n    # Hegh neH chav qoH.\n    1\tHegh\tHegh\tVERB\tVT\tPerson=3|ObjPerson=3,0\t_\t_\t_\t_\n    2\tneH\tneH\tADV\tADV\t_\t_\t_\t_\t_\n    3\tchav\tchav\tVERB\tVT\tPerson=3|ObjPerson=3,0\t_\t_\t_\t_\n    4\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    5\t.\t.\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n\n    # qanchoHpa' qoH, Hegh qoH.\n    1\tqanchoHpa'\tqan\tVERB\tV?.pa'\tPerson=3|ObjPerson=3,0\t_\t_\t_\tSuffixV3=-choH|SuffixV9=-pa'\n    2\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    3\t,\t,\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n    4\tHegh\tHegh\tVERB\tVT\tPerson=3|ObjPerson=3,0\t_\t_\t_\t_\n    5\tqoH\tqoH\tNOUN\tN\t_\t_\t_\t_\t_\n    6\t.\t.\tPUNCT\tPUNCT\t_\t_\t_\t_\t_\n\nIn this example the tagger made a mistake: it classified the first **Hegh** as VT, although it should be N. I don't have a correctly tagged corpus, so evaluating the tagger is currently impossible. :(\n\nCopyright\n---------\n\nyajwiz (c) 2020 Iikka Hauhio\n\nThis program a uses the `boQwI' dictionary \u003chttps://github.com/De7vID/klingon-assistant-data\u003e`_ (``data.json``) that is licensed under the Apache License 2.0.\n\nThe Python files are also licensed under the Apache License 2.0. See the LICENSE file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffergusq%2Fyajwiz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffergusq%2Fyajwiz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffergusq%2Fyajwiz/lists"}