{"id":13880044,"url":"https://github.com/salsify/offline-sort","last_synced_at":"2025-06-10T15:39:10.320Z","repository":{"id":2043293,"uuid":"44139739","full_name":"salsify/offline-sort","owner":"salsify","description":"A Ruby gem to sort large amounts of data using a predictable amount of memory.","archived":false,"fork":false,"pushed_at":"2023-11-02T21:09:10.000Z","size":42,"stargazers_count":84,"open_issues_count":0,"forks_count":3,"subscribers_count":60,"default_branch":"master","last_synced_at":"2025-05-12T04:32:41.221Z","etag":null,"topics":["gem"],"latest_commit_sha":null,"homepage":"http://blog.salsify.com/engineering/ruby-scalable-offline-sort","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/salsify.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-10-12T23:28:24.000Z","updated_at":"2024-02-28T09:07:57.000Z","dependencies_parsed_at":"2023-02-18T02:15:18.265Z","dependency_job_id":"4db59e0c-c192-4662-9cac-c53ad3a61a56","html_url":"https://github.com/salsify/offline-sort","commit_stats":{"total_commits":19,"total_committers":6,"mean_commits":"3.1666666666666665","dds":0.631578947368421,"last_synced_commit":"c932fc2d0d2a8a176967c289c34c9671a8ada4cb"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salsify%2Foffline-sort","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salsify%2Foffline-sort/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salsify%2Foffline-sort/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salsify%2Foffline-sort/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/salsify","download_url":"https://codeload.github.com/salsify/offline-sort/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salsify%2Foffline-sort/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259104092,"owners_count":22805807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gem"],"created_at":"2024-08-06T08:02:44.870Z","updated_at":"2025-06-10T15:39:10.284Z","avatar_url":"https://github.com/salsify.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"# offline-sort\n\nSort arbitrarily large collections of data with limited memory usage. Given an enumerable and a `sort_by` proc, this gem will break the input data into sorted chunks, persist the chunks, and return an `Enumerator`. Data read from this enumerator will be in its final sorted order.\n\nThe size of the chunks and the strategy for serializing and deserializing the data are configurable. The gem comes with builtin strategies for `Marshal`, `MessagePack` and `YAML`.\n\nThe development of this gem is documented in this [post](http://blog.salsify.com/engineering/ruby-scalable-offline-sort) from the Salsify Engineering Blog.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n    gem 'offline-sort'\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install offline-sort\n\n## Usage\n```ruby\n  arrays = [ [4,5,6], [7,8,9], [1,2,3] ]\n  \n  # Create a sorted enumerator\n  sorted = OfflineSort.sort(arrays, chunk_size: 1) do |array|\n    array.first\n  end\n  \n  # Stream results in sorted order\n  sorted.each do |entry|\n    # e.g. write to a file\n  end\n```\nThe example above will create 3 files with 1 array each, then output them in sorted order. You should try different values of `chunk_size` to find the best speed/memory combination for your use case. In general larger chunk sizes will use more memory but run faster.\n\nSorting is not limited to arrays. You can use anything that can be expressed in a `Enumerable#sort_by` block.\n\n## Using MessagePack\n\nMessage pack serialization is faster than the default Ruby `Marshal` strategy. To enable message pack serialization follow these steps.\n\n`gem install msgpack`\n\n`require 'msgpack'`\n\nRequiring MessagePack before you require `offline_sort` will automatically enable MessagePack serialization in the gem.\n\nLimitations\n\nThe MessagePack serialize/deserialize process stringifies hash keys so it is important to write your sort_by in terms of string keys.\n\n## Contributing\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalsify%2Foffline-sort","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsalsify%2Foffline-sort","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalsify%2Foffline-sort/lists"}