{"id":13411806,"url":"https://github.com/grosser/parallel","last_synced_at":"2025-05-12T13:05:29.516Z","repository":{"id":38917726,"uuid":"275303","full_name":"grosser/parallel","owner":"grosser","description":"Ruby: parallel processing made simple and fast","archived":false,"fork":false,"pushed_at":"2025-04-14T20:13:34.000Z","size":830,"stargazers_count":4198,"open_issues_count":35,"forks_count":254,"subscribers_count":73,"default_branch":"master","last_synced_at":"2025-05-12T13:05:12.089Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grosser.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"MIT-LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2009-08-11T18:54:16.000Z","updated_at":"2025-05-09T16:44:28.000Z","dependencies_parsed_at":"2023-02-19T01:31:15.352Z","dependency_job_id":"74dc995d-2551-41a0-b2d9-729403eb8425","html_url":"https://github.com/grosser/parallel","commit_stats":{"total_commits":671,"total_committers":78,"mean_commits":8.602564102564102,"dds":0.5678092399403875,"last_synced_commit":"ab06d9e6d79a8b1375debfc348ecc19cd54e390e"},"previous_names":[],"tags_count":122,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grosser%2Fparallel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grosser%2Fparallel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grosser%2Fparallel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grosser%2Fparallel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grosser","download_url":"https://codeload.github.com/grosser/parallel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253745150,"owners_count":21957317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:17.112Z","updated_at":"2025-05-12T13:05:29.471Z","avatar_url":"https://github.com/grosser.png","language":"Ruby","readme":"Parallel\n==============\n[![Gem Version](https://badge.fury.io/rb/parallel.svg)](https://rubygems.org/gems/parallel)\n[![Build Status](https://github.com/grosser/parallel/actions/workflows/actions.yml/badge.svg)](https://github.com/grosser/parallel/actions/workflows/actions.yml)\n\n\nRun any code in parallel Processes(\u003e use all CPUs), Threads(\u003e speedup blocking operations), or Ractors(\u003e use all CPUs).\u003cbr/\u003e\nBest suited for map-reduce or e.g. parallel downloads/uploads.\n\nInstall\n=======\n\n```Bash\ngem install parallel\n```\n\nUsage\n=====\n\n```Ruby\n# 2 CPUs -\u003e work in 2 processes (a,b + c)\nresults = Parallel.map(['a','b','c']) do |one_letter|\n  SomeClass.expensive_calculation(one_letter)\nend\n\n# 3 Processes -\u003e finished after 1 run\nresults = Parallel.map(['a','b','c'], in_processes: 3) { |one_letter| SomeClass.expensive_calculation(one_letter) }\n\n# 3 Threads -\u003e finished after 1 run\nresults = Parallel.map(['a','b','c'], in_threads: 3) { |one_letter| SomeClass.expensive_calculation(one_letter) }\n\n# 3 Ractors -\u003e finished after 1 run\nresults = Parallel.map(['a','b','c'], in_ractors: 3, ractor: [SomeClass, :expensive_calculation])\n```\n\nSame can be done with `each`\n```Ruby\nParallel.each(['a','b','c']) { |one_letter| ... }\n```\nor `each_with_index`, `map_with_index`, `flat_map`\n\nProduce one item at a time with `lambda` (anything that responds to `.call`) or `Queue`.\n\n```Ruby\nitems = [1,2,3]\nParallel.each( -\u003e { items.pop || Parallel::Stop }) { |number| ... }\n```\n\nAlso supports `any?` or `all?`\n\n```Ruby\nParallel.any?([1,2,3,4,5,6,7]) { |number| number == 4 }\n# =\u003e true\n\nParallel.all?([1,2,nil,4,5]) { |number| number != nil }\n# =\u003e false\n```\n\nProcesses/Threads are workers, they grab the next piece of work when they finish.\n\n### Processes\n - Speedup through multiple CPUs\n - Speedup for blocking operations\n - Variables are protected from change\n - Extra memory used\n - Child processes are killed when your main process is killed through Ctrl+c or kill -2\n\n### Threads\n - Speedup for blocking operations\n - Variables can be shared/modified\n - No extra memory used\n\n### Ractors\n - Ruby 3.0+ only\n - Speedup for blocking operations\n - No extra memory used\n - Very fast to spawn\n - Experimental and unstable\n - `start` and `finish` hooks are called on main thread\n - Variables must be passed in `Parallel.map([1,2,3].map { |i| [i, ARGV, local_var] }, ...`\n - use `Ractor.make_shareable` to pass in global objects\n\n### ActiveRecord\n\n#### Connection Lost\n\n - Multithreading needs connection pooling, forks need reconnects\n - Adjust connection pool size in `config/database.yml` when multithreading\n\n```Ruby\n# reproducibly fixes things (spec/cases/map_with_ar.rb)\nParallel.each(User.all, in_processes: 8) do |user|\n  user.update_attribute(:some_attribute, some_value)\nend\nUser.connection.reconnect!\n\n# maybe helps: explicitly use connection pool\nParallel.each(User.all, in_threads: 8) do |user|\n  ActiveRecord::Base.connection_pool.with_connection do\n    user.update_attribute(:some_attribute, some_value)\n  end\nend\n\n# maybe helps: reconnect once inside every fork\nParallel.each(User.all, in_processes: 8) do |user|\n  @reconnected ||= User.connection.reconnect! || true\n  user.update_attribute(:some_attribute, some_value)\nend\n```\n\n#### NameError: uninitialized constant\n\nA race happens when ActiveRecord models are autoloaded inside parallel threads\nin environments that lazy-load, like development, test, or migrations.\n\nTo fix, autoloaded classes before the parallel block with either `require '\u003cmodelname\u003e'` or  `ModelName.class`.\n\n### Break\n\n```Ruby\nParallel.map([1, 2, 3]) do |i|\n  raise Parallel::Break # -\u003e stops after all current items are finished\nend\n```\n\n```Ruby\nParallel.map([1, 2, 3]) { |i| raise Parallel::Break, i if i == 2 } == 2\n```\n\n### Kill\n\nOnly use if whatever is executing in the sub-command is safe to kill at any point\n\n```Ruby\nParallel.map([1,2,3]) do |x|\n  raise Parallel::Kill if x == 1# -\u003e stop all sub-processes, killing them instantly\n  sleep 100 # Do stuff\nend\n```\n\n### Progress / ETA\n\n```Ruby\n# gem install ruby-progressbar\n\nParallel.map(1..50, progress: \"Doing stuff\") { sleep 1 }\n\n# Doing stuff | ETA: 00:00:02 | ====================               | Time: 00:00:10\n```\n\nUse `:finish` or `:start` hook to get progress information.\n - `:start` has item and index\n - `:finish` has item, index, and result\n\nThey are called on the main process and protected with a mutex.\n(To just get the index, use the more performant `Parallel.each_with_index`)\n\n```Ruby\nParallel.map(1..100, finish: -\u003e (item, i, result) { ... do something ... }) { sleep 1 }\n```\n\nSet `finish_in_order: true` to call the `:finish` hook in the order of the input (will take longer to see initial output).\n\n```Ruby\nParallel.map(1..9, finish: -\u003e (item, i, result) { puts \"#{item} ok\" }, finish_in_order: true) { sleep rand }\n```\n\n### Worker number\n\nUse `Parallel.worker_number` to determine the worker slot in which your\ntask is running.\n\n```Ruby\nParallel.each(1..5, in_processes: 2) { |i| puts \"Item: #{i}, Worker: #{Parallel.worker_number}\" }\nItem: 1, Worker: 1\nItem: 2, Worker: 0\nItem: 3, Worker: 1\nItem: 4, Worker: 0\nItem: 5, Worker: 1\n```\n\n### Dynamically generating jobs\n\nExample: wait for work to arrive or sleep\n\n```ruby\nqueue = []\nThread.new { loop { queue \u003c\u003c rand(100); sleep 2 } } # job producer\nParallel.map(Proc.new { queue.pop }, in_processes: 3) { |f| f ? puts(\"#{f} received\") : sleep(1) }\n```\n\nTips\n====\n\n - [Benchmark/Test] Disable threading/forking with `in_threads: 0` or `in_processes: 0`, to run the same code with different setups\n - [Isolation] Do not reuse previous worker processes: `isolation: true`\n - [Stop all processes with an alternate interrupt signal] `'INT'` (from `ctrl+c`) is caught by default. Catch `'TERM'` (from `kill`) with `interrupt_signal: 'TERM'`\n - [Process count via ENV] `PARALLEL_PROCESSOR_COUNT=16` will use `16` instead of the number of processors detected. This is used to reconfigure a tool using `parallel` without inserting custom logic.\n - [Process count] `parallel` uses a number of processors seen by the OS for process count by default. If you want to use a value considering CPU quota, please add `concurrent-ruby` to your `Gemfile`.\n\nTODO\n====\n - Replace Signal trapping with simple `rescue Interrupt` handler\n\nAuthors\n=======\n\n### [Contributors](https://github.com/grosser/parallel/graphs/contributors)\n - [Przemyslaw Wroblewski](https://github.com/lowang)\n - [TJ Holowaychuk](http://vision-media.ca/)\n - [Masatomo Nakano](https://github.com/masatomo)\n - [Fred Wu](http://fredwu.me)\n - [mikezter](https://github.com/mikezter)\n - [Jeremy Durham](http://www.jeremydurham.com)\n - [Nick Gauthier](http://www.ngauthier.com)\n - [Andrew Bowerman](http://andrewbowerman.com)\n - [Byron Bowerman](http://blog.bm5k.com/)\n - [Mikko Kokkonen](https://github.com/mikian)\n - [brian p o'rourke](https://github.com/bpo)\n - [Norio Sato]\n - [Neal Stewart](https://github.com/n-time)\n - [Jurriaan Pruis](https://github.com/jurriaan)\n - [Rob Worley](https://github.com/robworley)\n - [Tasveer Singh](https://github.com/tazsingh)\n - [Joachim](https://github.com/jmozmoz)\n - [yaoguai](https://github.com/yaoguai)\n - [Bartosz Dziewoński](https://github.com/MatmaRex)\n - [yaoguai](https://github.com/yaoguai)\n - [Guillaume Hain](https://github.com/zedtux)\n - [Adam Wróbel](https://github.com/amw)\n - [Matthew Brennan](https://github.com/mattyb)\n - [Brendan Dougherty](https://github.com/brendar)\n - [Daniel Finnie](https://github.com/danfinnie)\n - [Philip M. White](https://github.com/philipmw)\n - [Arlan Jaska](https://github.com/ajaska)\n - [Sean Walbran](https://github.com/seanwalbran)\n - [Nathan Broadbent](https://github.com/ndbroadbent)\n - [Yuki Inoue](https://github.com/Yuki-Inoue)\n - [Takumasa Ochi](https://github.com/aeroastro)\n - [Shai Coleman](https://github.com/shaicoleman)\n - [Earlopain](https://github.com/Earlopain)\n\n[Michael Grosser](http://grosser.it)\u003cbr/\u003e\nmichael@grosser.it\u003cbr/\u003e\nLicense: MIT\u003cbr/\u003e\n","funding_links":[],"categories":["Ruby","Advanced Ruby and Rails","Web 后端","NLP Pipeline Subtasks","Processes and Threads","Gems","Concurrency and Parallelism"],"sub_categories":["Advanced Ruby","Pipeline Generation","Multithreading"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrosser%2Fparallel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrosser%2Fparallel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrosser%2Fparallel/lists"}