{"id":33914752,"url":"https://github.com/sinaru/rapidflow","last_synced_at":"2026-03-17T22:11:47.783Z","repository":{"id":322615051,"uuid":"1090247135","full_name":"sinaru/rapidflow","owner":"sinaru","description":"🌊 A Ruby library for concurrent batch data processing through lightweight, composable flows.","archived":false,"fork":false,"pushed_at":"2025-11-11T14:00:57.000Z","size":64,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-11T22:07:08.167Z","etag":null,"topics":["batch-processing","concurrent","ruby"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sinaru.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-05T12:15:07.000Z","updated_at":"2025-12-08T02:25:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sinaru/rapidflow","commit_stats":null,"previous_names":["sinaru/rapidflow"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sinaru/rapidflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sinaru%2Frapidflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sinaru%2Frapidflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sinaru%2Frapidflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sinaru%2Frapidflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sinaru","download_url":"https://codeload.github.com/sinaru/rapidflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sinaru%2Frapidflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30633241,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T17:32:55.572Z","status":"ssl_error","status_checked_at":"2026-03-17T17:32:38.732Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-processing","concurrent","ruby"],"created_at":"2025-12-12T06:38:32.181Z","updated_at":"2026-03-17T22:11:47.758Z","avatar_url":"https://github.com/sinaru.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌊 RapidFlow\n\n⚙️💎➡️📦💨🔁🌊\n\u003e A Ruby library for concurrent batch data processing through lightweight, composable flows.\n\n[![Gem Version](https://badge.fury.io/rb/rapidflow.svg)](https://badge.fury.io/rb/rapidflow)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n\u003e Note: ⚠️ This library is at a very early stage of development. The interfaces and APIs \n\u003e may change without backward compatibility guarantees in minor versions (0.[minor version].[patch]).\n\nRapidFlow is a lightweight, concurrent pipeline processor for Ruby that transforms data through multiple stages using Ruby Threads. \nPerfect for I/O-bound operations like web scraping, API calls, and data processing.\n\n## Features\n\n- 🚀 **Concurrent Processing** - Multiple workers per stage process items concurrently\n- 🔄 **True Pipelining** - Different stages process different items simultaneously\n- 📦 **Order Preservation** - Results returned in the same order items were pushed\n- 🛡️ **Error Handling** - Captures exceptions without stopping the flow\n- 🎯 **Simple API** - Easy to use, no complex configuration\n- 🪶 **Zero Dependencies** - Uses only Ruby's standard library\n\n## Requirements\n\n- Ruby \u003e= 3.2\n- No external dependencies\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'rapidflow'\n```\n\nAnd then execute:\n\n```bash\n$ bundle install\n```\n\nOr install it yourself as:\n\n```bash\n$ gem install rapidflow\n```\n\n## Quick Start\n\nCreate a batch instance.\n\n```ruby\nrequire 'rapidflow'\n\n# Create a 3-stage processing batch. Workers can be configured per stage basis or will use the default amount if omitted.\nscraper = RapidFlow::Batch.build do\n  stage -\u003e(url) { fetch_html(url) }, workers: 8 # Stage 1: Fetch HTML\n  stage -\u003e(html) { parse_data(html) }, workers: 2 # Stage 2: Parse data\n  stage -\u003e(data) { save_to_db(data) } # Stage 3: Save to a database\nend\n```\n\nAlternatively, you can also initialize the batch with the following syntax:\n\n```ruby\nbatch = RapidFlow::Batch.new(\n  { fn: -\u003e(url) { fetch_html(url) }, workers: 8 }, # Stage 1: Fetch HTML.\n  { fn: -\u003e(html) { parse_data(html) }, workers: 2 }, # Stage 2: Parse data\n  { fn: -\u003e(data) { save_to_db(data) } } # Stage 3: Save to database\n)\nbatch.start # need to explicitly start\n```\n\nPush items onto the batch and batch will start processing them concurrently.\n\n```ruby\nurls = ['http://example.com/1', 'http://example.com/2', 'http://example.com/3']\nurls.each { |url| scraper.push(url) }\n```\n\nOnce you are done with pushing all the items to the batch, you can get results. \n\n```ruby\nresults = scraper.results\n````\nNote that Once you call `Batch#results`, it will block the batch until all processing completes. Therefore, you can no \nlonger push items to the batch instance.\n\nThe results are returned in the same order as the original items were pushed. Each result is an array of\n`[data, error]`. No error means the item successfully were processed through the stages.\n\n```ruby\nresults.each_with_index do |(data, error), index|\n  if error\n    puts \"Item #{index} failed: #{error.message}\"\n  else\n    puts \"Item #{index} succeeded: #{data}\"\n  end\nend\n```\n\n## Error Handling\n\nRapidFlow continues running even when errors occur, instead of stopping the entire pipeline.\n\nWhen an item encounters an error at any stage, RapidFlow captures that error and moves the item to the \nfinal results—skipping all remaining stages for that particular item.\n\nEach result comes as a pair: `[data, error]`.\n- If processing failed: `error` contains the Error instance, and `data` holds whatever transformed data existed \nfrom the last successful stage (or original input data if the error occurred at the first stage).\n- If processing succeeded: `data` contains the fully processed result, and `error` is `nil`.\n\n```ruby\nbatch = RapidFlow::Batch.new(\n  { fn: -\u003e(url) { HTTP.get(url).body } }, # May raise network errors\n  { fn: -\u003e(body) { JSON.parse(body) } } # May raise JSON parse errors\n)\n\nurls.each { |url| batch.push(url) }\nresults = batch.results\n\nresults.each_with_index do |(data, error), index|\n  if error\n    # Original input if error happened at first stage. Otherwise, transformed data from the previous stage before the error happened\n    # It is preserved in 'data' for debugging if needed.\n    puts \"Data state before error #{data}\"\n    \n    puts \"Failed to process #{urls[index]}: #{error.class} - #{error.message}\"\n    # Log error, retry, or handle gracefully\n    \n    puts \"Error backtrace: \"\n    pp error.backtrace\n    # As any Exception contains the backtrace(https://docs.ruby-lang.org/en/master/Exception.html#method-i-backtrace),\n    # for further debugging, you can look into backtrace.\n  else\n    puts \"Success: #{data}\"\n  end\nend\n```\n\n**Error behavior:**\n- Exceptions are caught and returned with results\n- The transformed data from the previous stage is preserved when an error occurs\n- Errors in early stages skip remaining stages until they reach the result queue\n- Other items continue processing (errors don't stop the batch)\n\n## Usage Examples\n\n### Web Scraping Pipeline\n\n```ruby\nscraper = RapidFlow::Batch.build do\n  stage -\u003e(url) {\n    # Fetch HTML (may take 1-2 seconds per URL)\n    HTTP.get(url).to_s\n  }, workers: 8 # 8 workers can fetch 8 URLs simultaneously\n\n  stage -\u003e(html) {\n    # Parse HTML\n    Nokogiri::HTML(html).css('.product')\n  }\n\n  stage -\u003e(products) {\n    # Extract and transform data\n    products.map { |p| { name: p.css('.name').text, price: p.css('.price').text } }\n  }\n\n  stage -\u003e(data) {\n    # Save to a database\n    Product.insert_all(data)\n    data\n  }, workers: 2 # low count to reduce concurrent DB connections at this stage\nend\n\nurls.each { |url| scraper.push(url) }\nresults = scraper.results\n```\n\n### Image Processing Pipeline\n\n```ruby\nprocessor = RapidFlow::Batch.build do\n  stage -\u003e(path) { MiniMagick::Image.open(path) }, workers: 4 # Stage 1: Load image\n  stage -\u003e(img) { img.resize('800x600'); img }, workers: 4 # Stage 2: Resize\n  stage -\u003e(img) { img.colorspace('Gray'); img }, workers: 4 # Stage 3: Convert to grayscale\n  stage -\u003e(img) { img.write(\"output/#{img.path}\"); img }, workers: 4 # Stage 4: Save\nend\n\nDir.glob('images/*.jpg').each { |path| processor.push(path) }\nresults = processor.results\n\nputs \"Processed #{results.count { |_, err| err.nil? }} images successfully\"\n```\n\n### API Data Enrichment\n\n```ruby\nenricher = RapidFlow::Batch.build do\n  stage -\u003e(user_id) {\n    # Fetch user data from API\n    api_client.get(\"/users/#{user_id}\").parse\n  }, workers: 10 # Handle 10 API calls concurrently\n\n  stage -\u003e(user) {\n    # Fetch user's posts\n    user[:posts] = api_client.get(\"/users/#{user[:id]}/posts\").parse\n    user\n  }\n\n  stage -\u003e(user) {\n    # Add sentiment analysis to posts\n    user[:posts].each do |post|\n      post[:sentiment] = sentiment_analyzer.analyze(post[:content])\n    end\n    user\n  }\nend\n\nuser_ids.each { |id| enricher.push(id) }\nenriched_users = enricher.results\n```\n\n### ETL Pipeline\n\n```ruby\n# Extract, Transform, Load\netl = RapidFlow::Batch.build do\n  stage -\u003e(filename) {\n    # Extract: Read CSV file\n    CSV.read(filename, headers: true).map(\u0026:to_h)\n  }, workers: 3\n\n  stage -\u003e(rows) {\n    # Transform: Clean and validate data\n    rows.select { |row| valid?(row) }.map { |row| transform(row) }\n  }, workers: 3\n\n  stage -\u003e(rows) {\n    # Load: Insert into database\n    database.insert_all(rows)\n    rows.size\n  }, workers: 3\nend\n\ncsv_files.each { |file| etl.push(file) }\nresults = etl.results\n\ntotal_records = results.sum { |count, _| count || 0 }\nputs \"Loaded #{total_records} records\"\n```\n\n### Single Stage (Parallel Map)\n\n```ruby\n# Sometimes you just need parallel processing without multiple stages\n# Fetch 20 URLs concurrently\nfetcher = RapidFlow::Batch.new({ fn: -\u003e(url) { HTTP.get(url).body }, workers: 20 })\n\nurls.each { |url| fetcher.push(url) }\npages = fetcher.results\n```\n\n## Architecture\n\nRapidFlow uses a multi-stage pipeline architecture with concurrent workers at each stage.\n\n### Pipeline Flow\n\n```mermaid\ngraph LR\n    subgraph Input\n        P[push items]\n    end\n\n    subgraph Queue0[Input Queue]\n        Q0[Item 1\u003cbr/\u003eItem 2\u003cbr/\u003eItem 3\u003cbr/\u003e...]\n    end\n\n    subgraph Stage1[Stage 1 Lambda]\n        W1A[Worker 1A]\n        W1B[Worker 1B]\n        W1C[Worker 1C]\n        W1D[Worker 1D]\n    end\n\n    subgraph Queue1[Queue 1]\n        Q1[Result 1\u003cbr/\u003eResult 2\u003cbr/\u003eResult 3\u003cbr/\u003e...]\n    end\n\n    subgraph Stage2[Stage 2 Lambda]\n        W2A[Worker 2A]\n        W2B[Worker 2B]\n        W2C[Worker 2C]\n        W2D[Worker 2D]\n    end\n\n    subgraph Queue2[Queue 2]\n        Q2[Result 1\u003cbr/\u003eResult 2\u003cbr/\u003eResult 3\u003cbr/\u003e...]\n    end\n\n    subgraph Stage3[Stage 3 Lambda]\n        W3A[Worker 3A]\n        W3B[Worker 3B]\n        W3C[Worker 3C]\n        W3D[Worker 3D]\n    end\n\n    subgraph Output[Results Queue]\n        QF[Final 1\u003cbr/\u003eFinal 2\u003cbr/\u003eFinal 3\u003cbr/\u003e...]\n    end\n\n    subgraph Result\n        R[results method\u003cbr/\u003esorts by index]\n    end\n\n    P --\u003e Q0\n    Q0 --\u003e W1A \u0026 W1B \u0026 W1C \u0026 W1D\n    W1A \u0026 W1B \u0026 W1C \u0026 W1D --\u003e Q1\n    Q1 --\u003e W2A \u0026 W2B \u0026 W2C \u0026 W2D\n    W2A \u0026 W2B \u0026 W2C \u0026 W2D --\u003e Q2\n    Q2 --\u003e W3A \u0026 W3B \u0026 W3C \u0026 W3D\n    W3A \u0026 W3B \u0026 W3C \u0026 W3D --\u003e QF\n    QF --\u003e R\n\n    style P fill:#e1f5ff,color:#000\n    style R fill:#e1f5ff,color:#000\n    style Stage1 fill:#fff4e1,stroke:#000000,color:#000\n    style Stage2 fill:#fff4e1,stroke:#000000,color:#000\n    style Stage3 fill:#fff4e1,stroke:#000000,color:#000\n    style Queue0 fill:#f0f0f0,stroke:#000000,color:#000\n    style Queue1 fill:#f0f0f0,stroke:#000000,color:#000\n    style Queue2 fill:#f0f0f0,stroke:#000000,color:#000\n    style Output fill:#e8f5e9,stroke:#000000,color:#000\n```\n\n### How It Works\n\n1. **Items are indexed**: Each item pushed gets a sequential index for order preservation\n2. **Queues between stages**: Ruby `Queue` objects connect stages (thread-safe)\n3. **Worker threads**: Each stage has N worker threads pulling from input queue\n4. **Concurrent processing**:\n    - Workers at the same stage process different items in parallel (data parallelism)\n    - Different stages process different items simultaneously (pipeline parallelism)\n5. **Error propagation**: Errors are captured and passed through remaining stages\n6. **Result collection**: Final queue accumulates results, sorted by index before returning\n\n### Concurrency Model\n\n- **Thread-based**: Uses Ruby threads (not processes or fibers)\n- **GIL-aware**: Best for I/O-bound work; CPU-bound work limited by GIL\n- **Queue-based**: Thread-safe Ruby `Queue` for inter-stage communication\n- **Backpressure**: Queues naturally slow fast producers when consumers are slow\n- **Bounded workers**: Fixed thread pool per stage (no thread explosion)\n\n## Performance Tuning\n\n### Workers Per Stage\n\nChoose based on your workload:\n\n| Workload Type                                 | Recommended Workers | Reasoning                                   |\n|-----------------------------------------------|---------------------|---------------------------------------------|\n| **I/O-bound** (API calls, file I/O, database) | 4-20                | Can handle many concurrent I/O operations   |\n| **CPU-bound** (calculations, parsing)         | 1-2                 | Limited by Ruby's GIL                       |\n| **Mixed**                                     | 2-8                 | Balance between I/O wait and CPU contention |\n\n```ruby\n# High I/O workload - many workers\nRapidFlow::Batch.new({ fn: lambda1, workers: 100 }, { fn: lambda2, workers: 50 })\n\n# CPU-intensive - fewer workers\nRapidFlow::Batch.new({ fn: lambda1, workers: 2 }, { fn: lambda2, workers: 2 })\n```\n\n### Balancing Workers for Stages\n\nFor the best throughput, workers should be assigned based on the I/O-bound workload of each stage:\n\n```ruby\n# ❌ Same number of workers even though stages have different I/O load\nRapidFlow::Batch.build do\n  stage -\u003e(x) { sleep(10); x }, workers: 4 # 10 seconds - SLOW! (Assume a heavy or long-running I/O task)\n  stage -\u003e(x) { sleep(0.1); x }, workers: 4 # 0.1 seconds - fast\n  stage -\u003e(x) { sleep(0.1); x }, workers: 4 # 0.1 seconds - fast\n  stage -\u003e(x) { x }, workers: 4 # No I/O bound work\nend\n\n# ✅ Balanced - workers are assigned based of I/O load\nRapidFlow::Batch.build do\n  stage -\u003e(x) { sleep(10); x }, workers: 16 # 10 seconds - SLOW!\n  stage -\u003e(x) { sleep(0.1); x }, workers: 2 # 0.1 seconds - fast\n  stage -\u003e(x) { sleep(0.1); x }, workers: 2 # 0.1 seconds - fast\n  stage -\u003e(x) { x }, workers: 1 # No I/O bound work\nend\n```\n\n### Memory Considerations\n\n- Each queue can grow unbounded—don't push millions of items without consuming results\n- Workers hold items in memory during processing\n- Memory usage ≈ (items in queues + items being processed) × item size\n\n## Best Practices\n\n### ✅ Do\n\n- Use for I/O-bound operations (API calls, file operations, database queries)\n- Keep stages independent (avoid shared mutable state)\n- Handle errors gracefully in your application code\n- Use appropriate worker counts for your workload\n- Process items in batches for very large datasets\n\n### ❌ Don't\n\n- Use for CPU-bound operations (Ruby's GIL limits parallelism)\n- Share mutable state between workers without synchronization\n- Push millions of items without processing results (memory issue)\n- Create dependencies between items (order of execution not guaranteed)\n- Nest RapidFlow instances (use a single multi-stage batch instead)\n\n## Comparison with Alternatives\n\n| Feature                  | RapidFlow | Thread Pool   | Sidekiq        | Concurrent-Ruby |\n|--------------------------|-----------|---------------|----------------|-----------------|\n| **Multi-stage pipeline** | ✅        | ❌            | ⚠️ (manual)    | ❌              |\n| **Order preservation**   | ✅        | ❌            | ❌             | ❌              |\n| **In-memory**            | ✅        | ✅            | ❌ (Redis)     | ✅              |\n| **Dependencies**         | Zero      | Zero          | Redis          | Zero            |\n| **Synchronous results**  | ✅        | ⚠️ (manual)   | ❌             | ⚠️ (manual)     |\n| **Error handling**       | Built-in  | Manual        | Built-in       | Manual          |\n| **Setup complexity**     | Low       | Low           | High           | Medium          |\n\n## Sample benchmark results\n\nThe following result is taken from a benchmark run of [./scripts/benchmark/benchmark_api_request_process_and_storing.rb](./scripts/benchmark/benchmark_api_request_process_and_storing.rb)\n\n```bash\n/scripts/benchmark$ ruby benchmark_api_request_process_and_storing.rb 40 32\n================================================================================\nRapidFlow API Request, Process \u0026 Store Benchmark\n================================================================================\n\nConfiguration:\n  API: dummyjson.com\n  User IDs to process: 1 to 40\n  Workers per stage (RapidFlow): 32\n  Stages: Fetch User → Fetch Product → Merge Data → Save to File\n\nProcessing 40 user IDs...\n\n--------------------------------------------------------------------------------\n1. SYNCHRONOUS PROCESSING (No threads)\n--------------------------------------------------------------------------------\n                                     user     system      total        real\nSynchronous:                     0.356016   0.120360   0.476376 ( 13.180568)\n\nResults: 40 successful, 0 failed\n\n--------------------------------------------------------------------------------\n2. RAPIDFLOW CONCURRENT PROCESSING\n--------------------------------------------------------------------------------\n                                     user     system      total        real\nRapidFlow (32 workers):          0.217776   0.084002   0.301778 (  0.612455)\n\nResults: 40 successful, 0 failed\n\n================================================================================\nSUMMARY\n================================================================================\n\nSynchronous time:     13.18s\nRapidFlow time:       0.61s\n\nSpeedup:              21.52x faster\nTime saved:           12.57s\nPerformance gain:     2052.1%\n\n--------------------------------------------------------------------------------\nFILE VERIFICATION\n--------------------------------------------------------------------------------\nSynchronous output:   40 files created\nRapidFlow output:     40 files created\n\nSample output file: data_1.json\n  User ID: 1\n  User Name: Emily Johnson\n  Has product data: true\n  Product ID: 1\n  Product Title: Essence Mascara Lash Princess\n\n--------------------------------------------------------------------------------\nPERFORMANCE ANALYSIS\n--------------------------------------------------------------------------------\n\nAverage time per item:\n  Synchronous:  329.51ms\n  RapidFlow:    15.31ms\n\nThroughput (items/second):\n  Synchronous:  3.03 items/sec\n  RapidFlow:    65.31 items/sec\n```\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an\ninteractive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the\nversion number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,\npush git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/sinaru/rapidflow. This project is intended \nto be a safe, welcoming space for collaboration, and contributors are expected to adhere to the \n[code of conduct](https://github.com/sinaru/rapidflow/blob/main/CODE_OF_CONDUCT.md).\n\n## Code of Conduct\n\nEveryone interacting in the RapidFlow project's codebases, issue trackers, chat rooms and mailing lists is expected \nto follow the [code of conduct](https://github.com/sinaru/rapidflow/blob/main/CODE_OF_CONDUCT.md).\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsinaru%2Frapidflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsinaru%2Frapidflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsinaru%2Frapidflow/lists"}