{"id":18936128,"url":"https://github.com/collabora/mlfix","last_synced_at":"2025-10-25T10:07:29.344Z","repository":{"id":37974504,"uuid":"485530321","full_name":"collabora/MLfix","owner":"collabora","description":"Annotation QA backend","archived":false,"fork":false,"pushed_at":"2023-01-17T22:05:26.000Z","size":37681,"stargazers_count":5,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-15T20:02:24.324Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/collabora.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-25T20:48:37.000Z","updated_at":"2023-05-09T10:27:10.000Z","dependencies_parsed_at":"2023-02-10T12:15:31.559Z","dependency_job_id":null,"html_url":"https://github.com/collabora/MLfix","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/collabora/MLfix","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FMLfix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FMLfix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FMLfix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FMLfix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/collabora","download_url":"https://codeload.github.com/collabora/MLfix/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FMLfix/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280936639,"owners_count":26416603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-25T02:00:06.499Z","response_time":81,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T12:06:07.059Z","updated_at":"2025-10-25T10:07:29.306Z","avatar_url":"https://github.com/collabora.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MLfix – using AI and UI to explore and fix datasets\n\n\n\n[![Join the chat at https://gitter.im/MLfix/community](https://badges.gitter.im/MLfix/community.svg)](https://gitter.im/MLfix/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nThis repository contains tools which can help you find mistakes in your labels. It helps if you have some image dataset (for example an object detection dataset with bounding boxes) and you:\n\n1. want to make sure the objects are assigned to the correct class and bounding boxes are drawn\n2. wish to explore it and discover the different variations occuring in the data\n\nThe tools work by sorting the images by visual similarity and then showing them in a streamlined user interface. The interface allows you to mark the photos so you can perform the QA process. The visual similarity sorting is based on a model trained in an unsupervised manner so it's not limited to ImageNet-like data.\n\nWe are still working on the documentation and examples (it will be comming in a few weeks). In the mean time you can check the presentation we did at [OSS NA 2022](OSS%20NA%202022%20presentation.pdf).\n\n![A futuristic robot cleaning streets of New York that are overflowing with papers.](banner.jpg)\nIs your dataset overflowing with low quality samples? Our highly-skilled robots can help you! (generated by [Centipede Diffusion](https://github.com/Zalring/Centipede_Diffusion/) based on an image prompt composed manually from two other generated images)\n\n## How to use\n\nThis library contains command line tools to process the image. Right now it's easiest to start with any dataset in the ImageNet format (one folder per class) or with just a folder of unsorted pictures. For example if you download [the DeepFashion2 dataset](https://github.com/switchablenorms/DeepFashion2) you can run the following commands:\n\n\n```\ngit clone https://github.com/collabora/MLfix.git\ncd MLfix\npip install -e .\nqa_backend_downsize_images ./deepfashion2 ./deepfashion2-256\nqa_backend_pretrain --pretrained ./deepfashion2-256  # trains a model (starting from ImageNet weights)\n                                                     # and generates the BoVW features\nqa_backend_sort_images ./deepfashion2-256            # creates a JSON with all images sorted by similarity\n```\n\nAfterwards you can go run `python -m http.server` and go to the URL:\n`http://localhost:8000/mlfix-ui/#../deepfashion2-256/barlow-twins-resnet18-pretrained-224-5e-proj2048-lr0.5e-3-sample-1024vw.json` to load the MLfix web app.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollabora%2Fmlfix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcollabora%2Fmlfix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollabora%2Fmlfix/lists"}