{"id":33422224,"url":"https://github.com/gpizzorno/conllu_tools","last_synced_at":"2026-04-08T19:32:35.687Z","repository":{"id":322499879,"uuid":"994945220","full_name":"gpizzorno/conllu_tools","owner":"gpizzorno","description":"A Python toolkit for working with CoNLL-U files, Universal Dependencies treebanks, and annotated corpora.","archived":false,"fork":false,"pushed_at":"2025-11-29T17:04:34.000Z","size":6575,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-06T13:43:55.989Z","etag":null,"topics":["brat","conllu","conllu-evaluation","conllu-validation","latin","natural-language-processing","nlp","tag-conversion","tag-normalization","text-annotation","ud","universal-dependencies"],"latest_commit_sha":null,"homepage":"https://gpizzorno.github.io/conllu_tools/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gpizzorno.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-02T18:16:48.000Z","updated_at":"2025-12-09T16:17:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gpizzorno/conllu_tools","commit_stats":null,"previous_names":["gpizzorno/latin-nlp-utilities"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/gpizzorno/conllu_tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gpizzorno%2Fconllu_tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gpizzorno%2Fconllu_tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gpizzorno%2Fconllu_tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gpizzorno%2Fconllu_tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gpizzorno","download_url":"https://codeload.github.com/gpizzorno/conllu_tools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gpizzorno%2Fconllu_tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28382911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T10:34:27.190Z","status":"ssl_error","status_checked_at":"2026-01-13T10:34:26.289Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["brat","conllu","conllu-evaluation","conllu-validation","latin","natural-language-processing","nlp","tag-conversion","tag-normalization","text-annotation","ud","universal-dependencies"],"created_at":"2025-11-24T02:05:08.075Z","updated_at":"2026-01-13T10:48:59.841Z","avatar_url":"https://github.com/gpizzorno.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Latin NLP Utilities\n\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Python](https://img.shields.io/badge/Language-Python-blue.svg)](https://www.python.org)\n[![Tests](https://github.com/gpizzorno/latin-nlp-utilities/actions/workflows/tests.yml/badge.svg)](https://github.com/gpizzorno/latin-nlp-utilities/actions/workflows/tests.yml)\n[![Documentation](https://img.shields.io/badge/Docs-latest-blue.svg)](https://gpizzorno.github.io/latin-nlp-utilities/)\n\n**Latin NLP Utilities** is a set of convenience tools for working with Latin treebanks and annotated corpora. It provides converters, evaluation scripts, validation tools, and utilities for transforming, validating, and comparing Latin linguistic data in [CoNLL-U](https://universaldependencies.org/format.html) and [brat](https://brat.nlplab.org) standoff formats.\n\n[Read the documentation](https://gpizzorno.github.io/latin-nlp-utilities/)\n\n## Features\n\n- **brat/CoNLL-U Interoperability**: Convert between brat [standoff](https://brat.nlplab.org/standoff.html) and [CoNLL-U](https://universaldependencies.org/format.html)\n- **Morphological Feature Utilities**: Normalize and map features across tagsets ([Perseus](https://universaldependencies.org/treebanks/la_perseus/index.html), [ITTB](https://universaldependencies.org/treebanks/la_ittb/index.html), [PROIEL](https://universaldependencies.org/treebanks/la_proiel/index.html), [DALME](https://dalme.org))\n- **Validation**: Check CoNLL-U files for format and annotation guideline compliance\n- **Evaluation**: Score system outputs against gold-standard CoNLL-U files, including enhanced dependencies\n- **Extensible**: Easily add new tagset converters or feature mappings\n\nFor detailed information about each feature, see the [User Guide](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/index.html).\n\n## Installation\n\n### Quick Install\n\n```sh\npip install latin-nlp-utilities\n```\n\nFor detailed installation instructions, including platform-specific guidance and troubleshooting, see the [Installation Guide](https://gpizzorno.github.io/latin-nlp-utilities/installation.html).\n\n## Quick Start\n\n### Convert CoNLL-U to brat\n\n```python\nfrom nlp_utilities.brat import conllu_to_brat\n\nconllu_to_brat(\n    conllu_filename='path/to/conllu/yourfile.conllu',\n    output_directory='path/to/brat/files',\n    sents_per_doc=10,\n    output_root=True,\n)\n\n# Outputs .ann and .txt files to 'path/to/brat/files', alongside\n# annotation.conf, tools.conf, visual.conf, and metadata.json\n\n```\n\n### Convert brat to CoNLL-U\n\n```python\nfrom nlp_utilities.brat import brat_to_conllu\nfrom nlp_utilities.loaders import load_language_data\n\nfeature_set = load_language_data('feats', language='la')\nbrat_to_conllu(\n    input_directory='path/to/brat/files',\n    output_directory='path/to/conllu',\n    ref_conllu='yourfile.conllu',\n    feature_set=feature_set,\n    output_root=True\n)\n\n# Outputs yourfile-from_brat.conllu to 'path/to/conllu'\n```\n\n### Validate CoNLL-U Files\n\n```python\nfrom nlp_utilities.conllu import ConlluValidator\n\nvalidator = ConlluValidator(lang='la', level=2)\nreporter = validator.validate_file('path/to/yourfile.conllu')\n\n# Print error count\nprint(f'Errors found: {reporter.get_error_count()}')\n\n# Inspect first error\nsent_id, order, testlevel, error = reporter.errors[0]\nprint(f'Sentence ID: {sent_id}')  # e.g. 34\nprint(f'Testing at level: {sent_id}')  # e.g. 2\nprint(f'Error test level: {error.testlevel}')  # e.g. 1\nprint(f'Error type: {error.error_type}')  # e.g. \"Metadata\"\nprint(f'Test ID: {error.testid}')  # e.g. \"text-mismatch\"\nprint(f'Error message: {error.msg}')  # Full error message (see below)\n\n# Print all errors formatted as strings\nfor error in reporter.format_errors():\n    print(error)\n\n# Example output:\n# Sentence 34:\n# [L2 Metadata text-mismatch] The text attribute does not match the text \n# implied by the FORM and SpaceAfter=No values. Expected: 'Una scala....' \n# Reconstructed: 'Una scala ....' (first diff at position 9)\n```\n\n### Evaluate CoNLL-U Files\n\n```python\nfrom nlp_utilities.conllu import ConlluEvaluator\n\nevaluator = ConlluEvaluator(eval_deprels=True, treebank_type='0')\nscores = evaluator.evaluate_files(\n    gold_path='path/to/gold_standard.conllu',\n    system_path='path/to/system_output.conllu',\n)\n\nprint(f'UAS: {scores[\"UAS\"].f1:.2%}')\nprint(f'LAS: {scores[\"LAS\"].f1:.2%}')\n\n# Example output:\n# UAS: 64.82%\n# LAS: 48.16%\n```\n\n### Convert Between Tagsets\n\n```python\nfrom nlp_utilities.converters.upos import dalme_to_upos, upos_to_perseus\nfrom nlp_utilities.converters.xpos import ittb_to_perseus, llct_to_perseus\nfrom nlp_utilities.converters.features import feature_string_to_dict, feature_dict_to_string\n\nprint(dalme_to_upos('adjective'))\n# Returns 'ADJ'\n\nprint(upos_to_perseus('NOUN'))\n# Returns 'n'\n\nprint(ittb_to_perseus('VERB', 'gen4|tem1|mod1'))  \n# Returns 'v1sp-----'\n\nprint(llct_to_perseus('VERB', 'v|v|3|s|p|i|a|-|-|-', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|Voice=Act'))\n# Returns 'v3spia---'\n\nfeat_dict = feature_string_to_dict('Case=Nom|Gender=Neut|Number=Sing')\n# Returns a dictionary: \n{'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'}\n\nprint(feature_dict_to_string(feat_dict)) \n# Returns 'Case=Nom|Gender=Neut|Number=Sing'\n```\n\n### Normalize Morphology\n\n```python\nfrom nlp_utilities.loaders import load_language_data\nfrom nlp_utilities.normalizers import normalize_morphology\n\nfeature_set = load_language_data('feats', language='la')\n\n# Normalize morphology with feature reconciliation\n# VerbForm is missing from feats but present in ref_feats\nxpos, feats = normalize_morphology(\n    upos='VERB',\n    xpos='v-s-ga-g-',\n    feats='Aspect=Perf|Case=Gen|Degree=Pos|Number=Sing|Voice=Act',\n    feature_set=feature_set,\n    ref_features='Aspect=Perf|Case=Gen|Degree=Pos|Number=Sing|VerbForm=Ger|Voice=Act'\n)\n\nprint(xpos)\n# Returns 'v-stga-g-' (normalized and validated)\n\nprint(feats)\n# Returns {'Aspect': 'Perf', 'Case': 'Gen', 'Degree': 'Pos', 'Number': 'Sing', 'VerbForm': 'Ger', 'Voice': 'Act'}\n```\n\nFor more examples and detailed usage, see the [Quickstart Guide](https://gpizzorno.github.io/latin-nlp-utilities/quickstart.html).\n\n## Documentation\n\nThe full documentation includes:\n\n- **[Installation Guide](https://gpizzorno.github.io/latin-nlp-utilities/installation.html)**: Detailed installation instructions and troubleshooting\n- **[Quickstart Guide](https://gpizzorno.github.io/latin-nlp-utilities/quickstart.html)**: Get started quickly with common tasks\n- **[User Guide](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/index.html)**: Comprehensive guides for all features\n  - [brat Conversion](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/brat_conversion.html): CoNLL-U ↔ brat conversion\n  - [Validation](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/validation.html): Validation framework and recipes\n  - [Evaluation](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/evaluation.html): Metrics and evaluation workflows\n  - [Converters](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/converters.html): Tagset conversions\n  - [Normalization](https://gpizzorno.github.io/latin-nlp-utilities/user_guide/normalization.html): Feature normalization\n- **[API Reference](https://gpizzorno.github.io/latin-nlp-utilities/api_reference/index.html)**: Complete API documentation\n- **[Developer Guide](https://gpizzorno.github.io/latin-nlp-utilities/developer_guide/index.html)**: Architecture and testing guides for contributors\n\n\n## Acknowledgments\n\nThis toolkit builds upon and extends code from several sources:\n\n- CoNLL-U/brat conversion logic is based on the [tools](https://github.com/nlplab/brat/tree/master/tools) made available by the [brat team](https://brat.nlplab.org/about.html).\n- CoNLL-U evaluation is based on the work of Milan Straka and Martin Popel for the [CoNLL 2018 UD shared task](https://universaldependencies.org/conll18/), and Gosse Bouma for the [IWPT 2020 shared task](https://universaldependencies.org/iwpt20/task_and_evaluation.html).\n- CoNLL-U validation is based on [work](https://github.com/UniversalDependencies/tools/blob/b3925718ba7205976d80eda7628687218474b541/validate.py) by Filip Ginter and Sampo Pyysalo.\n\n## License\n\nThe project is licensed under the [MIT License](LICENSE), allowing free use, modification, and distribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgpizzorno%2Fconllu_tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgpizzorno%2Fconllu_tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgpizzorno%2Fconllu_tools/lists"}