{"id":13419259,"url":"https://github.com/cleanlab/cleanvision","last_synced_at":"2026-01-06T01:17:51.878Z","repository":{"id":65847285,"uuid":"496519322","full_name":"cleanlab/cleanvision","owner":"cleanlab","description":"Automatically find issues in image datasets and practice data-centric computer vision.","archived":false,"fork":false,"pushed_at":"2025-04-03T05:19:12.000Z","size":2227,"stargazers_count":1068,"open_issues_count":30,"forks_count":73,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-09T22:09:26.453Z","etag":null,"topics":["computer-vision","data-centric-ai","data-exploration","data-profiling","data-quality","data-science","data-validation","deep-learning","exploratory-data-analysis","image-analysis","image-classification","image-generation","image-quality","image-segmentation"],"latest_commit_sha":null,"homepage":"https://cleanvision.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cleanlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-26T07:14:11.000Z","updated_at":"2025-04-09T10:54:12.000Z","dependencies_parsed_at":"2024-01-11T23:22:41.986Z","dependency_job_id":"4f43a7e9-c1f8-424b-976a-71c2230f8553","html_url":"https://github.com/cleanlab/cleanvision","commit_stats":{"total_commits":280,"total_committers":24,"mean_commits":"11.666666666666666","dds":0.6964285714285714,"last_synced_commit":"af3fc3ffbf062693d32630f3a2e3adf3efad6fe0"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleanlab%2Fcleanvision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleanlab%2Fcleanvision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleanlab%2Fcleanvision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cleanlab%2Fcleanvision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cleanlab","download_url":"https://codeload.github.com/cleanlab/cleanvision/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248119294,"owners_count":21050755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","data-centric-ai","data-exploration","data-profiling","data-quality","data-science","data-validation","deep-learning","exploratory-data-analysis","image-analysis","image-classification","image-generation","image-quality","image-segmentation"],"created_at":"2024-07-30T22:01:13.497Z","updated_at":"2026-01-06T01:17:51.857Z","avatar_url":"https://github.com/cleanlab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/cleanvision_logo_open_source_transparent.png\" width=50% height=50%\u003e\n\u003c/p\u003e\n\n\u003cimg width=\"1200\" alt=\"Screen Shot 2023-03-10 at 10 23 33 AM\" src=\"https://user-images.githubusercontent.com/10901697/224394144-bb0e1c85-6851-4828-bcd2-4ed234270a78.png\"\u003e\n\nCleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc.\nThis data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning.\nCleanVision is super simple -- run the same couple lines of Python code to audit any image dataset!\n\n[![Read the Docs](https://readthedocs.org/projects/cleanvision/badge/?version=latest)](https://cleanvision.readthedocs.io/en/latest/)\n[![pypi](https://img.shields.io/pypi/v/cleanvision?color=blue)](https://pypi.org/pypi/cleanvision/)\n[![os](https://img.shields.io/badge/platform-noarch-lightgrey)](https://pypi.org/pypi/cleanvision/)\n[![py\\_versions](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/pypi/cleanvision/)\n[![codecov](https://codecov.io/github/cleanlab/cleanvision/branch/main/graph/badge.svg?token=y1N6MluN9H)](https://codecov.io/gh/cleanlab/cleanvision)\n\n## Installation\n```shell\npip install cleanvision\n```\n\n## Quickstart\n\nDownload an example dataset (optional). Or just use any collection of image files you have.\n\n```shell\nwget -nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'\n```\n\n1. Run CleanVision to audit the images.\n\n```python\nfrom cleanvision import Imagelab\n\n# Specify path to folder containing the image files in your dataset\nimagelab = Imagelab(data_path=\"FOLDER_WITH_IMAGES/\")\n\n# Automatically check for a predefined list of issues within your dataset\nimagelab.find_issues()\n\n# Produce a neat report of the issues found in your dataset\nimagelab.report()\n```\n\n2. CleanVision diagnoses many types of issues, but you can also check for only specific issues.\n\n```python\nissue_types = {\"dark\": {}, \"blurry\": {}}\n\nimagelab.find_issues(issue_types=issue_types)\n\n# Produce a report with only the specified issue_types\nimagelab.report(issue_types=issue_types)\n```\n\n\n## More resources\n\n- [Tutorial](https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html)\n- [Documentation](https://cleanvision.readthedocs.io/)\n- [Blog](https://cleanlab.ai/blog/cleanvision/)\n- [Run CleanVision on a HuggingFace dataset](https://cleanvision.readthedocs.io/en/latest/tutorials/huggingface_dataset.html)\n- [Run CleanVision on a Torchvision dataset](https://cleanvision.readthedocs.io/en/latest/tutorials/torchvision_dataset.html)\n- [Example script](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/run.py) that can be run with: `python examples/run.py --path \u003cFOLDER_WITH_IMAGES\u003e`\n- [Additional example notebooks](https://github.com/cleanlab/cleanvision-examples)\n- [FAQ](https://cleanvision.readthedocs.io/en/latest/faq.html)\n\n## *Clean* your data for better Computer *Vision*\n\nThe quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. CleanVision helps you automatically identify common types of data issues lurking in image datasets.\n\nThis package currently detects issues in the raw images themselves, making it a useful tool for any computer vision\ntask such as: classification, segmentation, object detection, pose estimation, keypoint detection, [generative modeling](https://openai.com/research/dall-e-2-pre-training-mitigations), etc.\nTo detect issues in the labels of your image data, you can instead\nuse the [cleanlab](https://github.com/cleanlab/cleanlab/) package.\n\nIn any collection of image files (most [formats](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html) supported), CleanVision can detect the following types of issues:\n\n|   | Issue Type       | Description                                                     | Issue Key        | Example                                                                                                                                 |\n|---|------------------|-----------------------------------------------------------------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------|\n| 1 | Exact Duplicates | Images that are identical to each other                         | exact_duplicates | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/exact_duplicates.png)                     |\n| 2 | Near Duplicates  | Images that are visually almost identical                       | near_duplicates  | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/near_duplicates.png)                      |\n| 3 | Blurry           | Images where details are fuzzy (out of focus)                   | blurry           | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/blurry.png)                               |\n| 4 | Low Information  | Images lacking content (little entropy in pixel values)         | low_information  | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/low_information.png)                      |\n| 5 | Dark             | Irregularly dark images (*under*exposed)                        | dark             | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/dark.jpg)                                 |\n| 6 | Light            | Irregularly bright images (*over*exposed)                       | light            | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/light.jpg)                                |\n| 7 | Grayscale        | Images lacking color                                            | grayscale        | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/grayscale.jpg)                            |\n| 8 | Odd Aspect Ratio | Images with an unusual aspect ratio (overly skinny/wide)        | odd_aspect_ratio | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_aspect_ratio.jpg)                     |\n| 9 | Odd Size         | Images that are abnormally large or small compared to the rest of the dataset | odd_size         | \u003cimg src=\"https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png\" width=20% height=20%\u003e |\n\nCleanVision supports Linux, macOS, and Windows and runs on Python 3.10+. Learn more from our [blog](https://cleanlab.ai/blog/cleanvision/).\n\n## Community\n\n* Interested in contributing? See the [contributing guide](CONTRIBUTING.md). An easy starting point is to\n  consider [issues](https://github.com/cleanlab/cleanvision/labels/good%20first%20issue) marked `good first issue`.\n\n* Ready to start adding your own code? See the [development guide](DEVELOPMENT.md).\n\n* Have an issue? [Search existing issues](https://github.com/cleanlab/cleanvision/issues?q=is%3Aissue)\n  or [submit a new issue](https://github.com/cleanlab/cleanvision/issues/new/choose).\n\n\n[issue]: https://github.com/cleanlab/cleanvision/issues/new\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleanlab%2Fcleanvision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcleanlab%2Fcleanvision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcleanlab%2Fcleanvision/lists"}