{"id":15519080,"url":"https://github.com/danieldk/conllx-utils","last_synced_at":"2025-10-07T23:41:53.697Z","repository":{"id":137794160,"uuid":"51837616","full_name":"danieldk/conllx-utils","owner":"danieldk","description":"CoNLL-X utilities","archived":false,"fork":false,"pushed_at":"2020-04-20T12:39:47.000Z","size":126,"stargazers_count":7,"open_issues_count":3,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-23T04:17:07.771Z","etag":null,"topics":["conll","corpora","cycle","partitioning","treebanks"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-16T13:27:13.000Z","updated_at":"2021-07-27T04:19:47.000Z","dependencies_parsed_at":"2023-05-22T14:00:10.161Z","dependency_job_id":null,"html_url":"https://github.com/danieldk/conllx-utils","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/danieldk/conllx-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fconllx-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fconllx-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fconllx-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fconllx-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldk","download_url":"https://codeload.github.com/danieldk/conllx-utils/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fconllx-utils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278866745,"owners_count":26059669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conll","corpora","cycle","partitioning","treebanks"],"created_at":"2024-10-02T10:19:59.077Z","updated_at":"2025-10-07T23:41:53.674Z","avatar_url":"https://github.com/danieldk.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CoNLL-X Utilities\n\n## Introduction\n\nThis is a set of utilities to modify files in the CoNLL-X tabular\nfiles. The package contains the following programs:\n\n* conllx-cleanup: replace most Unicode punctuation characters by\n  by ASCII equivalents.\n* conllx-compare: compare sentences on particular columns.\n* conllx-cycle: find dependency trees with (non-self) cycles.\n* conllx-grep: print sentences that have a token matching a pattern.\n* conllx-merge: merge CoNLL-X files.\n* conllx-partition: partition a CoNLL-X file in N files.\n* conllx-sample: take a random sample from a CoNLL-X file.\n* conllx-shuffle: shuffle sentences in a CoNLL-X file.\n* conllx-text: convert CoNLL-X file to plain text.\n\n## Download\n\nDownloads are available on the [release\npage](https://github.com/danieldk/conllx-utils/releases).\n\n## Recent changes\n\n* `conllx-tdz-expandmorph` has moved to the\n  [TüBa-D/DP](https://github.com/sfb833-a3/tueba-ddp/tree/master/tools/general)\n  tools, since it is corpus-specific.\n\n## Usage\n\nExecuting a command gives usage information when `--help` is given\nas an argument.\n\n## Todo\n\nA lot, including:\n\n* Partitioning is currently interleaving. Also support chunked partitioning.\n* Test with problematic inputs.\n* Merge specific columns from two CoNLL files.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fconllx-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldk%2Fconllx-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fconllx-utils/lists"}