{"id":27957601,"url":"https://github.com/laion-ai/image-deduplication-testset","last_synced_at":"2026-01-24T15:33:52.152Z","repository":{"id":103624761,"uuid":"560034413","full_name":"LAION-AI/image-deduplication-testset","owner":"LAION-AI","description":null,"archived":false,"fork":false,"pushed_at":"2022-11-21T16:46:50.000Z","size":252,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-07T18:13:46.685Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LAION-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-31T15:49:50.000Z","updated_at":"2024-01-04T17:13:03.000Z","dependencies_parsed_at":"2023-05-24T01:15:16.560Z","dependency_job_id":null,"html_url":"https://github.com/LAION-AI/image-deduplication-testset","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LAION-AI/image-deduplication-testset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2Fimage-deduplication-testset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2Fimage-deduplication-testset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2Fimage-deduplication-testset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2Fimage-deduplication-testset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LAION-AI","download_url":"https://codeload.github.com/LAION-AI/image-deduplication-testset/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2Fimage-deduplication-testset/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28730320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T10:24:43.181Z","status":"ssl_error","status_checked_at":"2026-01-24T10:24:36.112Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-07T18:13:43.605Z","updated_at":"2026-01-24T15:33:52.140Z","avatar_url":"https://github.com/LAION-AI.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Image Deduplication - Testset\n\nThis is a test set for finding duplicates of images.\nIt consists of 177 query images and 12 nearest neighbors from our LAION-5B Clip image embedding index.\nThe images of a query \u0026 the 12 neighbors are in a folder with the name \"1\" for the first query, \"2\" for the second query, ... up until \"177\".\n\n\nThe downloaded and zipped image folders can be found here: [https://drive.google.com/file/d/1XidNj35sUuDestW39iOyiXk6P03ReQqs/view?usp=sharing](https://drive.google.com/file/d/1d0hxUFr4EAGGJlzzVCpfTy05CgZrrlgW/view?usp=share_link)\n\nSome of the files are missing in the Gdrive zip file, cause trying to download them gave again and again failures. We advise to run all benchmarks on the zip file for consistency. Please make your scripts robuts for the case that some folders (42, 47, 75, 159) don't contain \"query.zip\" or that some neighbors are missing in a folder.   \n\nHere can you find visualizations of the samples:\nhttp://captions.christoph-schuhmann.de/visualizations-of-the-samples.html\n\n\nThe annotations are in a list of lists, where the first element is e.g. [0,0,1,1,0,0,0,1,1,1,1,1] and 0 means that it is non a duplicate and 1 that it is a duplicate.\n\nAs duplicates are considered images that show the same content from the same perspective, maybe with some slight crops, unimportant additional texts that don't change the meaning of the image, compression artefacts or slight blur, ... - If the same content is show from a different camera angle, it is not considered a duplicate.\n\nHere is a Colab that benchmarks an ensemble based on CLIP L 14 \u0026 ResNet50 features: https://colab.research.google.com/drive/1uVZXaaG7clj_fYkMWTpkn9pp5t29Zm8-?usp=sharing\n\nResults with Image Dedup https://github.com/idealo/imagededup :\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22318853/200183001-6fc032ad-1f91-449c-b128-b848deef9180.png\" alt=\"\" width=\"300\" \u003e\nVarious results ( CNN (MobilenetV3) with Embedding Similarity Threshold 0.90 ):\n\u003cimg src=\"https://user-images.githubusercontent.com/22318853/200182960-bebc9999-191a-4cf0-8d7b-ae207d68cae8.png\" alt=\"\" width=\"200\" \u003e\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22318853/200381374-92d2300c-fa81-4d5c-bb44-0af35b66225f.png\" alt=\"\" width=\"200\" \u003e\n\nFailure Cases with PHash (doesn't recognize colors):\n\u003cimg src=\"https://user-images.githubusercontent.com/22318853/200382019-89328b0a-f4a7-4afe-8696-f5aa75655bca.png\" alt=\"\" width=\"800\" \u003e\n\nFailure Cases with MobilenetV3 (much better with colors):\n\u003cimg src=\"https://user-images.githubusercontent.com/22318853/200381903-8808f5ef-dca3-4363-b18e-f192b8979bc2.png\" alt=\"\" width=\"800\" \u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaion-ai%2Fimage-deduplication-testset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaion-ai%2Fimage-deduplication-testset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaion-ai%2Fimage-deduplication-testset/lists"}