{"id":32647567,"url":"https://github.com/jacobmarks/image-deduplication-plugin","last_synced_at":"2025-10-31T05:55:28.534Z","repository":{"id":194412052,"uuid":"690319073","full_name":"jacobmarks/image-deduplication-plugin","owner":"jacobmarks","description":"Remove exact and approximate duplicates from your dataset in FiftyOne!","archived":false,"fork":false,"pushed_at":"2024-04-04T23:54:55.000Z","size":25,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-16T07:21:07.706Z","etag":null,"topics":["computer-vision","data-cleaning","deduplication","fiftyone","image-processing","plugin","python","similarity"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacobmarks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-09-12T01:05:52.000Z","updated_at":"2024-02-26T10:47:35.000Z","dependencies_parsed_at":"2023-09-24T05:09:37.856Z","dependency_job_id":"6363d831-a14b-48fb-bf47-e6cdc413edc3","html_url":"https://github.com/jacobmarks/image-deduplication-plugin","commit_stats":{"total_commits":13,"total_committers":1,"mean_commits":13.0,"dds":0.0,"last_synced_commit":"c299a499d343470eacbed7fd375d6d42f02a2b0c"},"previous_names":["jacobmarks/image-deduplication-plugin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jacobmarks/image-deduplication-plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fimage-deduplication-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fimage-deduplication-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fimage-deduplication-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fimage-deduplication-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacobmarks","download_url":"https://codeload.github.com/jacobmarks/image-deduplication-plugin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fimage-deduplication-plugin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281937758,"owners_count":26586774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","data-cleaning","deduplication","fiftyone","image-processing","plugin","python","similarity"],"created_at":"2025-10-31T05:55:26.307Z","updated_at":"2025-10-31T05:55:28.529Z","avatar_url":"https://github.com/jacobmarks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Image Deduplication Plugin\n\nThis plugin is a Python plugin that streamlines image deduplication workflows!\n\nWith this plugin, you can:\n\n- Find _exact_ duplicate images using a hash function\n- Find _near_ duplicate images using an embedding model and similarity threshold\n- View and interact with duplicate images in the App\n- Remove all duplicates, or keep a representative image from each duplicate set\n\n## Watch On Youtube\n[![Video Thumbnail](https://img.youtube.com/vi/aingeh0KdPw/0.jpg)](https://www.youtube.com/watch?v=aingeh0KdPw\u0026list=PLuREAXoPgT0RZrUaT0UpX_HzwKkoB-S9j\u0026index=5)\n\n\n## Installation\n\n```shell\nfiftyone plugins download https://github.com/jacobmarks/image-deduplication-plugin\n```\n\n## Operators\n\n### `find_approximate_duplicate_images`\n![find_approx_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/8cf44a01-505d-4942-8a24-2c2d65365894)\n\n\nThis operator finds near-duplicate images in a dataset using a specified similarity index paired with either a distance threshold or a fraction of samples to mark as duplicates.\n\n### `find_exact_duplicate_images`\n\n![find_exact_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/27c12f82-bd8f-45d7-9213-d5b9ceb99bcb)\n\nThis operator finds exact duplicate images in a dataset using a hash function.\n\n### `display_approximate_duplicate_groups`\n![display_approx_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/07fefbd4-9df7-4ff5-8433-091629c2a040)\n\nThis operator displays the images in a dataset that are near-duplicates of each other, grouped together.\n\n### `display_exact_duplicate_groups`\n![display_exact_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/19fec753-52d1-4237-9e24-78bc89a40af0)\n\nThis operator displays the images in a dataset that are exact duplicates of each other, grouped together.\n\n### `remove_all_approximate_duplicates`\n![remove_approx_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/1a23d1c1-3441-4286-b308-be99fb5f0a4a)\n\nThis operator removes all near-duplicate images from a dataset.\n\n### `remove_all_exact_duplicates`\n![remove_exact_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/59b26da7-9064-4da0-8fa8-85488e99b57c)\n\nThis operator removes all exact duplicate images from a dataset.\n\n### `deduplicate_approximate_duplicates`\n\n![dedup_approx_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/f5661c6c-ebe9-41c6-9de8-a2c8048176f8)\n\nThis operator removes near-duplicate images from a dataset, _keeping a representative image_ from each duplicate set.\n\n### `deduplicate_exact_duplicates`\n\n![dedup_exact_dups](https://github.com/jacobmarks/image-deduplication-plugin/assets/12500356/30abc333-0f60-4a7a-a461-1b9dd6eb8331)\n\nThis operator removes exact duplicate images from a dataset, _keeping a representative image_ from each duplicate set.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fimage-deduplication-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacobmarks%2Fimage-deduplication-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fimage-deduplication-plugin/lists"}