{"id":13722978,"url":"https://github.com/KurdishBLARK/KTC-Segmented","last_synced_at":"2025-05-07T16:31:39.374Z","repository":{"id":113708551,"uuid":"248229233","full_name":"KurdishBLARK/KTC-Segmented","owner":"KurdishBLARK","description":"A segmented version of KTC","archived":false,"fork":false,"pushed_at":"2020-05-01T11:02:37.000Z","size":2185,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-01-28T23:09:08.666Z","etag":null,"topics":["corpus","kurdish","kurdish-language-processing","natural-language-processing"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KurdishBLARK.png","metadata":{"files":{"readme":"README.md","changelog":"history.zip","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-03-18T12:45:12.000Z","updated_at":"2024-01-05T21:40:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"5b1c0546-5e4f-45d2-92b8-4607f21733a9","html_url":"https://github.com/KurdishBLARK/KTC-Segmented","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurdishBLARK%2FKTC-Segmented","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurdishBLARK%2FKTC-Segmented/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurdishBLARK%2FKTC-Segmented/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurdishBLARK%2FKTC-Segmented/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KurdishBLARK","download_url":"https://codeload.github.com/KurdishBLARK/KTC-Segmented/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252915355,"owners_count":21824548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corpus","kurdish","kurdish-language-processing","natural-language-processing"],"created_at":"2024-08-03T01:01:35.119Z","updated_at":"2025-05-07T16:31:36.684Z","avatar_url":"https://github.com/KurdishBLARK.png","language":null,"funding_links":[],"categories":["Development"],"sub_categories":["Benchmarks"],"readme":"# KTC-Segmented\nThis repository is the sentence segmented KTC.\nIt follows the KTC's structure. \nEach file is the line sigmented form of its counterpart in the raw corpus.\nThe segmentation process and related discussions have been presented in a paper entitled \n\"Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts\".\nThe paper is appeared at \u003ca href=\"https://africanlp-workshop.github.io/program.html\" target=\"_blank\"\u003eAfricaNLp Workshop at ICLR 2020\u003c/a\u003e.\nSee the presentation of the related article \u003ca href=\"https://slideslive.com/38926588/using-punkt-for-sentence-segmentation-in-nonlatin-scripts-experiments-on-kurdish-sorani-texts\" target=\"_blank\"\u003ehere\u003c/a\u003e.\nSee the related poster \u003ca href=\"https://drive.google.com/file/d/10DbS9j05wYawN8elVGZfK69UcdHSQmT6/view\"  target=\"_blank\"\u003ehere\u003c/a\u003e.\n\nIf you use this data, referring to it, or referring to its related paper, please cite it as follows:\n\n~~~\n@inproceedings{abdulrahman2020using,\n    title = \"Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts\",\n    author = \"Abdulrahman, Roshna Omer  and Hassani, Hossein},\n    booktitle = \"Proceedings of the AfricaNLP Wrokshop at ICLR 2020\",\n    month = \"4\",\n    year = \"2020\",\n    address = \"Virtual\",\n    url = \"http://export.arxiv.org/pdf/2004.14134\",\n    eprint = \"2004.14134\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.CL\",\n}\n~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKurdishBLARK%2FKTC-Segmented","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKurdishBLARK%2FKTC-Segmented","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKurdishBLARK%2FKTC-Segmented/lists"}