{"id":33914855,"url":"https://github.com/coderhs/active_storage_dedup","last_synced_at":"2026-04-06T07:01:59.534Z","repository":{"id":325742603,"uuid":"1102231739","full_name":"coderhs/active_storage_dedup","owner":"coderhs","description":"ActiveStorageDedup","archived":false,"fork":false,"pushed_at":"2026-01-14T03:28:52.000Z","size":188,"stargazers_count":15,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-14T05:48:19.167Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coderhs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-23T04:03:45.000Z","updated_at":"2026-01-14T03:28:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/coderhs/active_storage_dedup","commit_stats":null,"previous_names":["coderhs/active_storage_dedup"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/coderhs/active_storage_dedup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Factive_storage_dedup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Factive_storage_dedup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Factive_storage_dedup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Factive_storage_dedup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coderhs","download_url":"https://codeload.github.com/coderhs/active_storage_dedup/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Factive_storage_dedup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31436304,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T08:13:15.228Z","status":"ssl_error","status_checked_at":"2026-04-05T08:13:11.839Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-12T06:43:02.512Z","updated_at":"2026-04-06T07:01:59.528Z","avatar_url":"https://github.com/coderhs.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ActiveStorageDedup\n\nAutomatic deduplication for Rails Active Storage. Prevents duplicate file uploads by reusing existing blobs with matching checksums, saving storage space and bandwidth.\n\n## Features\n\n- **Automatic Deduplication**: Reuses existing blobs when identical files are uploaded\n- **All Upload Methods Supported**: Works with form uploads, direct uploads, and programmatic attachments\n- **Service-Aware**: Properly handles multiple storage services (local, S3, etc.)\n- **Reference Counting**: Tracks blob usage with automatic counter cache\n- **Three-Level Configuration**: Master switch, global default, and per-attachment control\n- **Sanity Check Job**: Periodic job to clean up any duplicates that slip through\n- **Auto-Purge Orphans**: Automatically removes blobs when no attachments reference them\n- **Zero Dependencies**: Works with standard Rails Active Storage\n\n## Demo/Implementation APP\n\n[Github Repo](https://github.com/coderhs/rails-storage-example)\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'active_storage_dedup'\n```\n\nThen execute:\n\n```bash\nbundle install\n```\n\nRun the install generator to create the migration:\n\n```bash\nrails generate active_storage_dedup:install\n```\n\nThis will:\n- Create the migration to add `reference_count` column to `active_storage_blobs`\n- Create composite index on `[checksum, service_name]` for fast duplicate lookups\n- Create an initializer at `config/initializers/active_storage_dedup.rb` with default configuration\n\nFinally, run the migration:\n\n```bash\nrails db:migrate\n```\n\n## Usage\n\n### Basic Setup\n\nActiveStorageDedup works automatically with all Active Storage attachments. No code changes required!\n\n```ruby\nclass User \u003c ApplicationRecord\n  has_one_attached :avatar\n  has_many_attached :documents\nend\n\n# Upload a file\nuser.avatar.attach(io: File.open('photo.jpg'), filename: 'photo.jpg')\n\n# Upload the same file to another user - reuses existing blob!\nanother_user.avatar.attach(io: File.open('photo.jpg'), filename: 'photo.jpg')\n```\n\n### Configuration\n\nThe install generator creates `config/initializers/active_storage_dedup.rb` with these options:\n\n```ruby\nActiveStorageDedup.configure do |config|\n  # Master switch to enable/disable the entire gem (default: true)\n  # Set to false to completely disable all deduplication and lifecycle management\n  config.enabled = true\n\n  # Default deduplication setting for all attachments (default: true)\n  # Controls whether attachments deduplicate by default when gem is enabled\n  # Can be overridden per-attachment\n  config.deduplicate_by_default = true\n\n  # Auto-purge blobs when reference_count reaches 0 (default: true)\n  config.auto_purge_orphans = true\nend\n```\n\n### Three-Level Control\n\n#### Level 1: Master Switch (`enabled`)\nCompletely enable or disable the gem:\n\n```ruby\nconfig.enabled = false  # Gem does nothing - Active Storage works normally\n```\n\n#### Level 2: Global Default (`deduplicate_by_default`)\nSet the default behavior for all attachments:\n\n```ruby\n# Opt-out pattern: deduplicate by default, disable selectively\nconfig.enabled = true\nconfig.deduplicate_by_default = true\n\nclass Product \u003c ApplicationRecord\n  has_many_attached :images              # ✅ Deduplicates\n  has_one_attached :badge, deduplicate: false  # ❌ Doesn't deduplicate\nend\n```\n\n```ruby\n# Opt-in pattern: don't deduplicate by default, enable selectively\nconfig.enabled = true\nconfig.deduplicate_by_default = false\n\nclass Product \u003c ApplicationRecord\n  has_many_attached :images, deduplicate: true  # ✅ Deduplicates\n  has_one_attached :avatar                      # ❌ Doesn't deduplicate\nend\n```\n\n#### Level 3: Per-Attachment Override\nOverride the global default for specific attachments:\n\n```ruby\nclass Product \u003c ApplicationRecord\n  # Uses config.deduplicate_by_default\n  has_one_attached :image\n\n  # Explicit override: always deduplicate\n  has_many_attached :photos, deduplicate: true\n\n  # Explicit override: never deduplicate\n  has_one_attached :unique_badge, deduplicate: false\nend\n```\n\n### Rake Tasks\n\n#### Report Duplicates\n\nSee all duplicate blobs (dry run):\n\n```bash\nrails active_storage_dedup:report_duplicates\n```\n\nOutput:\n```\nChecksum: abc123def456...\nService: local\nFilename: photo.jpg\nTotal blobs: 3\nKeeper blob ID: 42 (1 attachments)\nDuplicate blob IDs: 43, 44\nTotal attachments across duplicates: 2\nWasted storage: 2.5 MB\n```\n\n#### Clean Up All Duplicates\n\nRun the sanity check job to find and merge all duplicate blobs:\n\n```bash\nrails active_storage_dedup:cleanup_all\n```\n\nOr run the job directly:\n\n```ruby\nActiveStorageDedup::DeduplicationJob.perform_now\n```\n\n#### Backfill Reference Counts\n\nRecalculate reference counts for existing blobs:\n\n```bash\nrails active_storage_dedup:backfill_reference_count\n```\n\n### Scheduled Cleanup (Recommended)\n\nDue to race conditions during concurrent uploads, duplicates may occasionally slip through. Run the sanity check job periodically to clean them up:\n\n**With whenever gem:**\n\n```ruby\n# config/schedule.rb\nevery 1.week, at: '2:00 am' do\n  runner \"ActiveStorageDedup::DeduplicationJob.perform_later\"\nend\n```\n\n**With sidekiq-cron:**\n\n```ruby\n# config/initializers/sidekiq.rb\nSidekiq::Cron::Job.create(\n  name: 'Active Storage Dedup - weekly cleanup',\n  cron: '0 2 * * 0',  # 2 AM every Sunday\n  class: 'ActiveStorageDedup::DeduplicationJob'\n)\n```\n\n**With Rails built-in scheduler (Good Job, Solid Queue, etc.):**\n\n```ruby\n# config/recurring.yml\nactive_storage_dedup_cleanup:\n  class: ActiveStorageDedup::DeduplicationJob\n  schedule: \"weekly on sunday at 2am\"\n```\n\n**With cron:**\n\n```bash\n# Weekly cleanup every Sunday at 2 AM\n0 2 * * 0 cd /app \u0026\u0026 bin/rails runner \"ActiveStorageDedup::DeduplicationJob.perform_now\"\n```\n\n## How It Works\n\n### Deduplication Strategy\n\nActiveStorageDedup uses `[checksum, service_name]` as the deduplication key:\n\n- **Checksum**: Active Storage's built-in MD5 checksum\n- **Service Name**: Storage service (local, S3, etc.)\n\nWhen a file is uploaded:\n\n1. Checksum is calculated\n2. Existing blob with same checksum + service is searched\n3. If found, existing blob is reused\n4. If not found, new blob is created\n\n### Three Interception Points\n\nThe gem patches three Active Storage methods to cover all upload flows:\n\n1. **`build_after_unfurling`**: Form uploads (Rails 6.1+)\n2. **`create_before_direct_upload!`**: Direct uploads to cloud storage\n3. **`create_after_unfurling!`**: Programmatic attachments via `attach()`\n\n### Reference Counting\n\nUses Rails' built-in counter cache:\n\n```ruby\n# Automatically incremented when attachment created\nbelongs_to :blob, counter_cache: :reference_count\n\n# Check references\nblob.reference_count  # =\u003e 3\nblob.attachments.count  # =\u003e 3\n```\n\n### Auto-Purge Orphans\n\nWhen an attachment is destroyed:\n\n1. Counter cache automatically decrements\n2. If `reference_count` reaches 0, blob is purged\n3. Physical file is deleted from storage\n\n## Advanced Usage\n\n### Direct Uploads\n\nWorks seamlessly with Active Storage's direct upload feature:\n\n```javascript\n// Client-side - no changes needed!\n// ActiveStorageDedup automatically deduplicates on the server\n```\n\n### Multiple Services\n\nBlobs are service-specific. Same file on different services = separate blobs:\n\n```ruby\nuser.avatar.attach(\n  io: File.open('photo.jpg'),\n  filename: 'photo.jpg',\n  service_name: :local  # Uses local storage\n)\n\nuser.documents.attach(\n  io: File.open('photo.jpg'),\n  filename: 'photo.jpg',\n  service_name: :amazon  # Creates separate blob on S3\n)\n```\n\n### Manual Sanity Check\n\nRun the sanity check job manually to clean up all duplicates:\n\n```ruby\n# Run synchronously (blocks until complete)\nActiveStorageDedup::DeduplicationJob.perform_now\n\n# Run asynchronously (queues the job)\nActiveStorageDedup::DeduplicationJob.perform_later\n```\n\nThe job will:\n1. Scan the database for all duplicate blob groups (same checksum + service)\n2. For each group, keep the oldest blob and merge duplicates into it\n3. Move all attachments from duplicate blobs to the keeper\n4. Update reference counts\n5. Delete duplicate blob records\n\n## Examples\n\n### Reference Counting\n\n```ruby\nblob = ActiveStorage::Blob.create_after_upload!(\n  io: File.open('shared.jpg'),\n  filename: 'shared.jpg'\n)\n\nuser1.avatar.attach(blob)\nblob.reference_count  # =\u003e 1\n\nuser2.avatar.attach(blob)\nblob.reference_count  # =\u003e 2\n\nuser1.avatar.purge\nblob.reference_count  # =\u003e 1\n\nuser2.avatar.purge\n# =\u003e Blob automatically purged (reference_count = 0)\n```\n\n### Environment-Specific Configuration\n\n```ruby\n# config/environments/development.rb\nRails.application.configure do\n  # Disable in development for faster uploads during testing\n  ActiveStorageDedup.configure do |config|\n    config.enabled = false\n  end\nend\n\n# config/environments/production.rb\nRails.application.configure do\n  # Enable in production to save storage\n  ActiveStorageDedup.configure do |config|\n    config.enabled = true\n    config.deduplicate_by_default = true\n  end\nend\n```\n\n## Quick Reference\n\n### Configuration Options\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `enabled` | `true` | Master switch - disables entire gem when false |\n| `deduplicate_by_default` | `true` | Default behavior for attachments (can be overridden) |\n| `auto_purge_orphans` | `true` | Automatically delete blobs when reference_count = 0 |\n\n### Model Options\n\n```ruby\nhas_one_attached :avatar                    # Uses deduplicate_by_default\nhas_many_attached :docs, deduplicate: true  # Always deduplicate\nhas_one_attached :badge, deduplicate: false # Never deduplicate\n```\n\n### Rake Tasks\n\n| Task | Description |\n|------|-------------|\n| `rails active_storage_dedup:report_duplicates` | Show all duplicate blobs (dry run) |\n| `rails active_storage_dedup:cleanup_all` | Run sanity check to merge duplicates |\n| `rails active_storage_dedup:backfill_reference_count` | Recalculate reference counts |\n\n### Jobs\n\n```ruby\n# Run sanity check manually\nActiveStorageDedup::DeduplicationJob.perform_now\n\n# Queue sanity check\nActiveStorageDedup::DeduplicationJob.perform_later\n```\n\n## Requirements\n\n- Rails 6.0+\n- Active Storage configured\n- ActiveJob (for background cleanup)\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/coderhs/active_storage_dedup. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/coderhs/active_storage_dedup/blob/main/CODE_OF_CONDUCT.md).\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n## Code of Conduct\n\nEveryone interacting in the ActiveStorageDedup project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/coderhs/active_storage_dedup/blob/main/CODE_OF_CONDUCT.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderhs%2Factive_storage_dedup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoderhs%2Factive_storage_dedup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderhs%2Factive_storage_dedup/lists"}