{"id":20061657,"url":"https://github.com/tylerrick/activerecord-find_duplicates","last_synced_at":"2025-08-11T17:42:14.275Z","repository":{"id":66442454,"uuid":"198327413","full_name":"TylerRick/activerecord-find_duplicates","owner":"TylerRick","description":"Easily find all duplicate records","archived":false,"fork":false,"pushed_at":"2019-07-23T05:05:06.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-12T22:28:15.662Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TylerRick.png","metadata":{"files":{"readme":"Readme.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"License","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-23T01:23:44.000Z","updated_at":"2019-07-23T05:05:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"43d0dea6-e847-499c-a353-ee039186edf9","html_url":"https://github.com/TylerRick/activerecord-find_duplicates","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Factiverecord-find_duplicates","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Factiverecord-find_duplicates/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Factiverecord-find_duplicates/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Factiverecord-find_duplicates/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TylerRick","download_url":"https://codeload.github.com/TylerRick/activerecord-find_duplicates/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241488198,"owners_count":19970829,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T13:21:20.144Z","updated_at":"2025-03-02T10:17:19.137Z","avatar_url":"https://github.com/TylerRick.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Activerecord::FindDuplicates\n\n## Installation\n\nAdd this line to your application's `Gemfile`:\n\n```ruby\ngem 'activerecord-find_duplicates'\n```\n\n## Usage\n\nGeneral usage is:\n```ruby\nModel.find_duplicates(on: attr_name)\n```\n\nYou can pass a minimum number to be considered a duplicate (default is 2) with `min:`.\n\nExample: To find all user records that have a duplicate email address:\n```ruby\nUser.find_duplicates(on: :email)\n# =\u003e [#\u003cUser:0x000055e7916ff3c8 id: 1, email: \"a@a.com\"\u003e,\n      #\u003cUser:0x000055e7916ff1e8 id: 2, email: \"a@a.com\"\u003e]\n```\n\nOften it is useful to group by the duplicate value, making the value the key and the set of records sharing that key as the value:\n```ruby\nUser.find_duplicates(on: :email).group_by(\u0026:email)\n# =\u003e {\"a@a.com\"=\u003e\n  [#\u003cUser:0x000055cc1915f0c8 id: 1, email: \"a@a.com\"\u003e,\n   #\u003cUser:0x000055cc1915ef38 id: 2, email: \"a@a.com\"\u003e]}\n```\n\nYou can also chain it on other relations. For example, to find all duplicates *except* those with a null value:\n```ruby\nUser.where('email is not null').find_duplicates(on: :email)\n```\n\n## Possible use: clean up data before adding a unique data\n\nYou realize that a certain column should be unique but actually contains duplicate values. Even though you had a uniqueness validation on the model:\n```ruby\nvalidates :email, uniqueness: true\n```\n, this is subject to race conditions. The only sure way to prevent duplicate values on a column is to add a unique index/constraint and let your *database* engine enforce the constraint.\n\nBut before you can add a migration that adds that index, you have to remove all duplicates or you will get:\n```\nPG::UniqueViolation: ERROR:  could not create unique index \"index_users_on_email\"\nDETAIL:  Key (email)=(user@example.com) is duplicated.\n```\n\nYou might do something like this to delete all but the most recent record for each distinct value:\n\n```ruby\n    User.where('email is not null').find_duplicates(on: :email).group_by(\u0026:email).each do |email, users|\n      users.sort_by(\u0026:created_at).each.with_index do |user, i|\n        user.destroy unless i == users.length - 1\n      end\n    end\n```\n\nand something like this to prevent such duplicates from being added again in the future:\n\n```ruby\n    change_table :users do |t|\n      t.remove_index name: :index_users_on_email\n      t.index :email, unique: true\n    end\n```\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/TylerRick/activerecord-find_duplicates.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerrick%2Factiverecord-find_duplicates","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftylerrick%2Factiverecord-find_duplicates","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerrick%2Factiverecord-find_duplicates/lists"}