{"id":18541829,"url":"https://github.com/cltk/non_models_cltk","last_synced_at":"2026-02-13T17:04:31.331Z","repository":{"id":76839223,"uuid":"110300469","full_name":"cltk/non_models_cltk","owner":"cltk","description":"Trained tagger for Old Norse","archived":false,"fork":false,"pushed_at":"2018-08-13T14:40:54.000Z","size":1101,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-08-30T12:05:19.860Z","etag":null,"topics":["cltk","norse","norse-models-cltk","taggers-trained","tnt"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cltk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-10T22:59:16.000Z","updated_at":"2020-04-06T23:34:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"71112fcd-6971-451b-9af1-148ea7c70e2b","html_url":"https://github.com/cltk/non_models_cltk","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cltk/non_models_cltk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Fnon_models_cltk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Fnon_models_cltk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Fnon_models_cltk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Fnon_models_cltk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cltk","download_url":"https://codeload.github.com/cltk/non_models_cltk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Fnon_models_cltk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29412670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-13T06:24:03.484Z","status":"ssl_error","status_checked_at":"2026-02-13T06:23:12.830Z","response_time":78,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cltk","norse","norse-models-cltk","taggers-trained","tnt"],"created_at":"2024-11-06T20:06:26.707Z","updated_at":"2026-02-13T17:04:31.316Z","avatar_url":"https://github.com/cltk.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# CLTK models for Old Norse\nTrained taggers for the CLTK.\n\n## POS tagging\n\n### Choice of the corpora\nTexts already annotated by researchers were selected because they were in Old Norse written from XII to XIV centuries,\nthe Golden Age of Old Norse texts : `[Icelandic Parsed Historical Corpus](http://www.linguist.is/icelandic_treebank/Download) (version 0.9, license: LGPL)\nPaths to annotated texts are made as you can see below:\n``` python\n    \u003e\u003e\u003e selected_files = [\"1150.firstgrammar.sci-lin.tagged\", \"1150.homiliubok.rel-ser.tagged\",\n                      \"1210.jartein.rel-sag.tagged\", \"1210.thorlakur.rel-sag.tagged\",\n                      \"1250.sturlunga.nar-sag.tagged\", \"1250.thetubrot.nar-sag.tagged\",\n                      \"1260.jomsvikingar.nar-sag.tagged\", \"1270.gragas.law-law.tagged\",\n                      \"1275.morkin.nar-his.tagged\", \"1300.alexander.nar-sag.tagged\",\n                      \"1310.grettir.nar-sag.tagged\", \"1325.arni.nar-sag.tagged\",\n                      \"1350.bandamennM.nar-sag.tagged\", \"1350.finnbogi.nar-sag.tagged\",\n                      '1350.marta.rel-sag.tagged']\n    \u003e\u003e\u003e selected_data = [\"icepahc-v0.9/tagged/\"+selected_file for selected_file in selected_files]\n```\n### Extraction of words and tags\n``` python\n    \u003e\u003e\u003e words_tags = []\n    \u003e\u003e\u003e for filename in selected_data:\n             words_tags.extend(extract_word_and_tags(filename))\n```\nThe function extract_word_and_tags gets a filename as input and returns the list of (word, tag) of the whole text.\nSentences were not segmented so the POS tagger is not trained completely correctly. However, it does the work.\n\n### Taggers trained with TnT\n``` python\n    \u003e\u003e\u003e tagger = tnt.TnT()\n    \u003e\u003e\u003e tagger.train(words_tags)\n    \u003e\u003e\u003e with open(os.path.join(\"taggers\", \"pos\", \"tnt.pickle\"), \"wb\") as f:\n             mpck = pickle.Pickler(f)\n             mpck.dump(tagger)\n```\nThe model data of the TnT can be retrieved thanks to the pickle module.\n\n### Tagset\n\nhttp://nlp.cs.ru.is/pdf/Tagset.pdf\n\n### Complete description of the used corpus\n\nhttp://www.linguist.is/icelandic_treebank/Icelandic_Parsed_Historical_Corpus_(IcePaHC)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcltk%2Fnon_models_cltk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcltk%2Fnon_models_cltk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcltk%2Fnon_models_cltk/lists"}