{"id":29017951,"url":"https://github.com/akarazniewicz/cocosplit","last_synced_at":"2025-06-25T23:08:51.100Z","repository":{"id":37759812,"uuid":"206797058","full_name":"akarazniewicz/cocosplit","owner":"akarazniewicz","description":"Simple tool to split COCO annotations into train/test datasets.","archived":false,"fork":false,"pushed_at":"2023-08-15T15:00:04.000Z","size":10,"stargazers_count":215,"open_issues_count":15,"forks_count":87,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-11-07T19:29:29.833Z","etag":null,"topics":["coco","datapreprocessing","deeplearning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akarazniewicz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-06T13:13:33.000Z","updated_at":"2023-11-06T06:08:36.000Z","dependencies_parsed_at":"2022-07-15T21:48:21.040Z","dependency_job_id":null,"html_url":"https://github.com/akarazniewicz/cocosplit","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/akarazniewicz/cocosplit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akarazniewicz%2Fcocosplit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akarazniewicz%2Fcocosplit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akarazniewicz%2Fcocosplit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akarazniewicz%2Fcocosplit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akarazniewicz","download_url":"https://codeload.github.com/akarazniewicz/cocosplit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akarazniewicz%2Fcocosplit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261967130,"owners_count":23237665,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coco","datapreprocessing","deeplearning"],"created_at":"2025-06-25T23:08:40.116Z","updated_at":"2025-06-25T23:08:51.081Z","avatar_url":"https://github.com/akarazniewicz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Simple tool to split a multi-label coco annotation dataset with preserving class distributions among train and test sets.\n\nThe code is an updated version from [akarazniewicz/cocosplit](https://github.com/akarazniewicz/cocosplit)  original repo, where the functionality of splitting multi-class data while preserving distributions is added.\n\n\n## Installation\n\n``cocosplit`` requires python 3 and basic set of dependencies:\n\nspecifically, in addition to the requirements of the original repo, (``scikit-multilearn``) is required, it is included the requirements.txt file\n\n```\npip install -r requirements\n```\n\n\n## Usage\n\nThe same as the original repo, with adding an argument (``--multi-class``) to preserve class distributions\nThe argument is optional to ensure backward compatibility\n\n```\n$ python cocosplit.py -h\nusage: cocosplit.py [-h] -s SPLIT [--having-annotations]\n                    coco_annotations train test\n\nSplits COCO annotations file into training and test sets.\n\npositional arguments:\n  coco_annotations      Path to COCO annotations file.\n  train                 Where to store COCO training annotations\n  test                  Where to store COCO test annotations\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -s SPLIT              A percentage of a split; a number in (0, 1)\n  --having-annotations  Ignore all images without annotations. Keep only these\n                        with at least one annotation\n  --multi-class         Split a multi-class dataset while preserving class\n                        distributions in train and test sets\n```\n\n# Running\n\n```\n$ python cocosplit.py --having-annotations --multi-class -s 0.8 /path/to/your/coco_annotations.json train.json test.json\n```\n\nwill split ``coco_annotation.json`` into ``train.json`` and ``test.json`` with ratio 80%/20% respectively. It will skip all\nimages (``--having-annotations``) without annotations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakarazniewicz%2Fcocosplit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakarazniewicz%2Fcocosplit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakarazniewicz%2Fcocosplit/lists"}