{"id":13578370,"url":"https://github.com/airctic/icedata","last_synced_at":"2025-04-10T02:15:03.345Z","repository":{"id":46020031,"uuid":"293939172","full_name":"airctic/icedata","owner":"airctic","description":"IceData: Datasets Hub for the *IceVision* Framework","archived":false,"fork":false,"pushed_at":"2022-03-15T14:23:08.000Z","size":167033,"stargazers_count":49,"open_issues_count":16,"forks_count":13,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-10T02:14:45.299Z","etag":null,"topics":["annotation-parsers","annotations-formats","coco","coco-dataset","coco-parser","computer-vision-datasets","custom-parser","dataset","deep-learning","fastai","object-detection","pycoco","pycocotools","pytorch","pytorch-lightning","voc-dataset","voc-parser"],"latest_commit_sha":null,"homepage":"https://airctic.github.io/icedata/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/airctic.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-08T22:14:47.000Z","updated_at":"2024-03-19T13:22:29.000Z","dependencies_parsed_at":"2022-08-28T10:23:38.742Z","dependency_job_id":null,"html_url":"https://github.com/airctic/icedata","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airctic%2Ficedata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airctic%2Ficedata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airctic%2Ficedata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airctic%2Ficedata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/airctic","download_url":"https://codeload.github.com/airctic/icedata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248142902,"owners_count":21054671,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotation-parsers","annotations-formats","coco","coco-dataset","coco-parser","computer-vision-datasets","custom-parser","dataset","deep-learning","fastai","object-detection","pycoco","pycocotools","pytorch","pytorch-lightning","voc-dataset","voc-parser"],"created_at":"2024-08-01T15:01:29.910Z","updated_at":"2025-04-10T02:15:03.305Z","avatar_url":"https://github.com/airctic.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/icedata-logo-slogan.png\" alt=\"logo\" width=\"400px\" style=\"display: block; margin-left: auto; margin-right: auto\"/\u003e\n  \u003ch2\u003e\u003cb\u003eDatasets Hub for the IceVision Framework\u003c/b\u003e\u003c/h2\u003e\n\u003c/div\u003e\n\n* * * * *\n\u003e**Note: We Need Your Help**\n    If you find this work useful, please let other people know by **starring** it,\n    and sharing it. \n    Thank you!\n    \n\u003cdiv align=\"center\"\u003e\n  \n[![tests](https://github.com/airctic/icedata/workflows/tests/badge.svg?event=push)](https://github.com/airctic/icedata/actions?query=workflow%3Atests)\n[![docs](https://github.com/airctic/icedata/workflows/docs/badge.svg)](https://airctic.github.io/icedata/)\n[![codecov](https://codecov.io/gh/airctic/icedata/branch/master/graph/badge.svg)](https://codecov.io/gh/airctic/icedata)\n[![PyPI version](https://badge.fury.io/py/icedata.svg)](https://badge.fury.io/py/icedata)\n[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/airctic/icevision/blob/master/LICENSE)  \n\n[![Discord](https://img.shields.io/discord/735877944085446747?label=Discord\u0026logo=Discord)](https://discord.gg/2jqrwrQ)\n\n\u003c/div\u003e\n\n\n* * * * *\n\n\n\u003c!-- Not included in docs - start --\u003e\n## **Contributors**\n\n[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/0)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/0)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/1)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/1)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/2)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/2)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/3)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/3)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/4)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/4)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/5)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/5)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/6)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/6)[![](https://sourcerer.io/fame/lgvaz/airctic/icedata/images/7)](https://sourcerer.io/fame/lgvaz/airctic/icedata/links/7)\n\n![](images/docs.png) [ **Documentation**](https://airctic.github.io/icedata/)\n\n## Installation\n\n```bash\npip install icedata\n```\n\nFor more installation options, check our extensive [documentation](https://airctic.github.io/icevdata/install/).\n\n**Important:** We currently only support Linux/MacOS.\n\u003c!-- Not included in docs - end --\u003e\n\n## Why IceData?\n\n- IceData is a dataset hub for the [IceVision](https://github.com/airctic/icevision) Framework\n\n- It includes community maintained datasets and parsers and has out-of-the-box support for common annotation formats (COCO, VOC, etc.)\n\n- It provides an overview of each included dataset with a description, an annotation example, and other helpful information\n\n- It makes end-to-end training straightforward thanks to IceVision's unified API\n\n- It enables practioners to get moving with object detection technology quickly\n\n## Datasets\n\n[**Source**](https://github.com/airctic/icedata/tree/master/icedata/datasets)\n\nThe `Datasets` class is designed to simplify loading and parsing a wide range of computer vision datasets.\n\n**Main Features:**\n\n- Caches data so you don't need to download it over and over\n\n- Lightweight and fast\n\n- Transparent and pythonic API\n\n- Out-of-the-box parsers convert common dataset annotation formats into the unified IceVision Data Format\n\nIceData provides several ready-to-use datasets that use both common annotation formats such as COCO and VOC as well as other annotation formats such [WheatParser](https://airctic.github.io/icevision/custom_parser/) used in the [Kaggle Global Wheat Competition](https://www.kaggle.com/c/global-wheat-detection)\n\n## Usage\n\nObject detection datasets use multiple annotation formats (COCO, VOC, and others). IceVision makes it easy to work across all of them with its easy-to-use and extend parsers.\n\n\n### COCO and VOC compatible datasets\nFor COCO or VOC compatible datasets - especially ones that are not include in IceData - it is easiest to use the IceData\nCOCO or VOC parser.\n\n**Example:** Raccoon - a dataset using the VOC parser\n\n```python\n# Imports\nfrom icevision.all import *\nimport icedata\n\n\n# WARNING: Make sure you have already cloned the raccoon dataset using the command shown here above\n# Set images and annotations directories\ndata_dir = Path(\"raccoon_dataset\")\nimages_dir = data_dir / \"images\"\nannotations_dir = data_dir / \"annotations\"\n\n# Define the class_map\nclass_map = ClassMap([\"raccoon\"])\n\n# Create a parser for dataset using the predefined icevision VOC parser\nparser = parsers.voc(\n    annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map\n)\n\n# Parse the annotations to create the train and validation records\ntrain_records, valid_records = parser.parse()\nshow_records(train_records[:3], ncols=3, class_map=class_map)\n```\n\n!!! info \"Note\" \n    Notice how we use the predifined [parsers.voc()](https://github.com/airctic/icevision/blob/master/icevision/parsers/voc_parser.py) function:\n    \n    **parser = parsers.voc(\n    annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map\n    )**\n\n\n### Datasets included in IceData\nDatasets included in IceData always have their own parser. It can be invoked with `icedata.`datasetname`.parser(...)`.\n\n**Example:** The IceData Fridge dataset\n\nPlease check out the [fridge folder](https://github.com/airctic/icedata/tree/master/icedata/datasets/fridge) for more information on how this dataset is structured.\n\n```python\n# Imports\nfrom icevision.all import *\nimport icedata\n\n# Load the Fridge Objects dataset\ndata_dir = icedata.fridge.load()\n\n# Get the class_map\nclass_map = icedata.fridge.class_map()\n\n# Parse the annotations\nparser = icedata.fridge.parser(data_dir, class_map)\ntrain_records, valid_records = parser.parse()\n\n# Show images with their boxes and labels\nshow_records(train_records[:3], ncols=3, class_map=class_map)\n```\n\n!!! info \"Note\" \n    Notice how we use the parser associated with the fridge dataset [icedata.fridge.parser()](https://github.com/airctic/icedata/blob/master/icedata/datasets/fridge/parsers.py):\n    \n    **parser = icedata.fridge.parser(data_dir, class_map)**\n\n\n### Datasets with a new annotation format\n\nSometimes, you will need to define a new annotation format for you dataset. Additional information can be found in the [documentation](https://airctic.com/custom_parser/). In this case, we strongly recommend you following the file structure and naming conventions used in the  examples such as the [Fridge dataset](https://github.com/airctic/icedata/tree/master/icedata/datasets/fridge), or the [PETS dataset](https://github.com/airctic/icedata/tree/master/icedata/datasets/pets).\n\n![image](https://airctic.github.io/icedata/images/datasets-folder-structure.png)\n\n# Disclaimer\n\nInspired from the excellent HuggingFace [Datasets](https://github.com/huggingface/datasets) project, icedata is a utility library that downloads and prepares computer vision datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the its license.\n\nIf you are a dataset owner and wish to update any of the information in IceData (description, citation, etc.), or do not want your dataset to be included, please get in touch through a [GitHub issue](https://github.com/airctic/icedata/issues). Thanks for your contribution to the ML community!\n\nIf you are interested in learning more about responsible AI practices, including fairness, please see [Google AI's Responsible AI Practices](https://ai.google/responsibilities/responsible-ai-practices/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairctic%2Ficedata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fairctic%2Ficedata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairctic%2Ficedata/lists"}