{"id":13562627,"url":"https://github.com/datquocnguyen/VnDT","last_synced_at":"2025-04-03T18:34:11.430Z","repository":{"id":97972755,"uuid":"177048464","full_name":"datquocnguyen/VnDT","owner":"datquocnguyen","description":"VnDT: A Vietnamese Dependency Treebank","archived":false,"fork":false,"pushed_at":"2021-11-06T02:52:56.000Z","size":2147,"stargazers_count":20,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-04T14:45:39.280Z","etag":null,"topics":["dependency-parse-trees","dependency-parsing","vietnamese-nlp"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datquocnguyen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LicenseVnDT.pdf","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-03-22T01:09:47.000Z","updated_at":"2024-11-04T06:19:18.000Z","dependencies_parsed_at":"2024-01-14T03:47:39.513Z","dependency_job_id":"9a393973-74d8-4cb8-90bd-d1610649dd1d","html_url":"https://github.com/datquocnguyen/VnDT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datquocnguyen%2FVnDT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datquocnguyen%2FVnDT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datquocnguyen%2FVnDT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datquocnguyen%2FVnDT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datquocnguyen","download_url":"https://codeload.github.com/datquocnguyen/VnDT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247057197,"owners_count":20876533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dependency-parse-trees","dependency-parsing","vietnamese-nlp"],"created_at":"2024-08-01T13:01:10.520Z","updated_at":"2025-04-03T18:34:06.415Z","avatar_url":"https://github.com/datquocnguyen.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# VnDT: A Vietnamese dependency treebank\n\nVnDT is a Vietnamese dependency treebank, consisting of 10K+ sentences. The construction of VnDT is detailed in our [following paper](https://github.com/datquocnguyen/VnDT/blob/master/VnDT-paper-CameraReadyVersion.pdf):\n\n    @InProceedings{Nguyen2014NLDB,\n      author = {Nguyen, Dat Quoc  and  Nguyen, Dai Quoc  and  Pham, Son Bao and Nguyen, Phuong-Thai and Nguyen, Minh Le},\n      title = {{From Treebank Conversion to Automatic Dependency Parsing for Vietnamese}},\n      booktitle = {{Proceedings of 19th International Conference on Application of Natural Language to Information Systems}},\n      year = {2014},\n      pages = {196-207}\n    }\n\n\u003e **By downloading the VnDT treebank, USER agrees:**\n\u003e - to use VnDT for research or educational purposes only.\n\u003e - to not distribute VnDT or part of VnDT in any original or modified form.\n\u003e- and to cite our paper whenever VnDT is used to help produce published results.\n\n## Data split\n\n#### With gold-standard POS tags\n\nThe VnDT treebank with gold-standard POS tags is split as follows:\n\n- VnDTv1.1-gold-POS-tags-train.conll: 8977 sentences\n- VnDTv1.1-gold-POS-tags-dev.conll: 200 sentences\n- VnDTv1.1-gold-POS-tags-test.conll: 1020 sentences\n\nThese gold-standard POS tags are defined and extracted from the corresponding [Vietnamese constituent treebank](https://www.aclweb.org/anthology/W09-3035/).\n\n#### With predicted POS tags\n\nThe VnDT treebank with automatically predicted POS tags is split as follows:\n\n- VnDTv1.1-predicted-POS-tags-train.conll: 8977 sentences\n- VnDTv1.1-predicted-POS-tags-dev.conll: 200 sentences\n- VnDTv1.1-predicted-POS-tags-test.conll: 1020 sentences\n\nThe automatically predicted POS tags are resulted from handling a data leakage issue as detailed in the [following paper](http://arxiv.org/abs/2101.01476):\n\n    @inproceedings{phonlp,\n    title     = {{PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing}},\n    author    = {Linh The Nguyen and Dat Quoc Nguyen},\n    booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations},\n    year      = {2021}\n    }\n\nPlease additionally cite this paper whenever the VnDT variant with automatically predicted POS tags is used to help produce published results.\n\n## Versions:\n\n- 04/2014: Released VnDT version 1.0 (VnDTv1.0).\n- 12/2018: Released VnDT version 1.1 (VnDTv1.1), fixing several conversion errors that are caused by annotation inconsistencies in the Vietnamese constituent treebank.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatquocnguyen%2FVnDT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatquocnguyen%2FVnDT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatquocnguyen%2FVnDT/lists"}