{"id":16701452,"url":"https://github.com/ferd/batchio","last_synced_at":"2025-03-23T14:31:46.934Z","repository":{"id":141683231,"uuid":"12471207","full_name":"ferd/batchio","owner":"ferd","description":"io:format middle-man that buffers and batches output sent to the io server for better throughput","archived":false,"fork":false,"pushed_at":"2013-09-05T16:12:50.000Z","size":188,"stargazers_count":36,"open_issues_count":0,"forks_count":9,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-02T01:36:27.639Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ferd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-08-29T21:04:23.000Z","updated_at":"2024-02-21T06:55:58.000Z","dependencies_parsed_at":"2023-03-12T11:45:19.881Z","dependency_job_id":null,"html_url":"https://github.com/ferd/batchio","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferd%2Fbatchio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferd%2Fbatchio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferd%2Fbatchio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ferd%2Fbatchio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ferd","download_url":"https://codeload.github.com/ferd/batchio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244306141,"owners_count":20431747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T18:44:07.348Z","updated_at":"2025-03-23T14:31:46.610Z","avatar_url":"https://github.com/ferd.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"# batchio #\n\nExperimental project to batch calls to io:format/2 in order to save time over\ncommunication and whatnot whenever the central io server of an Erlang node\nbecomes a bottleneck\n\n## building ##\n\n`rebar get-deps compile`\n\n## running tests ##\n\n`rebar get-deps compile \u0026\u0026 rebar ct skip_deps=true`\n\n## using ##\n\n```\napplication:start(pobox).\napplication:start(batchio).\nbatchio:format(\"abc: ~p~n\", [myreq]).\n```\n\nNote that if batchio isn't started, `batchio:format/1-2` will redirect calls\nto `io:format/1-2`.\n\n## configuration ##\n\nAll configurations can be done using Erlang's OTP applications' usual `env`\nvariables.\n\n- `buffer`: Before starting batchio, you can define a maximal buffer size by\n  setting the env variable `buffer` to an integer. As many slots will be\n  allocated in batchio's queue buffer. Note that this size cannot be changed\n  dynamically at this time, but is on the roadmap for whenever someone will\n  need it.\n- `page_size`: This variable allows to define how many bytes will be sent on\n  each batch for the messages. This value can be modified dynamically, on a\n  global level (which is fine because batchio exists globally only)\n- `leader`: determines where to send the IO data. By default, batchio will use\n  its original `group_leader()` value for that (its application controller,\n  forwarding to either the local `user` process, or a remote one depending on\n  how the node has been started (see\n  http://ferd.ca/repl-a-bit-more-and-less-than-that.html for details).\n  This value cannot be changed dynamically yet.\n\n## Benchmarks\n\nBenchmarks are run sequentially to provide an alright snapshot of performance,\nwhile voiding complex issues of synchronizing concurrent pieces of code to know\nwhen we're done. Benchmarks are run with:\n\n```\nbatchio_bench:run(40000, 4096, [noop], 10000, 25).\n```\n\nMeaning batchio has 40k elements available in its buffer, will send data in\npages of 4096 bytes, will use a fake IO server that doesn't actually do output,\nand will send 10000 random 25-bytes messages as fast as possible. The messages\nare sent as lists (so we get a good picture of message passing overhead) called\nwith both `io:format/3` (directly to the fake IO Server) and\n`benchio:format/1`.\n\nThe results are returned of the form:\n\n```\n{ok, [{regular,{MicroSecs,ok}},\n      {batchio,{Microsecs,[{total,Handled}, {dropped, MessagesNotDelivered}]}}]}\n```\n\nAll benchmarks here are run on a MBP running OSX 10.8.4, on 2 GHZ Intel Core\ni7, with 8GB 1600Mhz DDR3 memory, running Erlang R16B.\n\n### Multiple runs, noop ###\n\nRaw results:\n\n```\n1\u003e batchio_bench:run(40000, 4096, [noop], 10000, 25).\n{ok,[{regular,{147307,ok}},\n     {batchio,{82214,[{total,10000},{dropped,0}]}}]}\n2\u003e batchio_bench:run(40000, 4096, [noop], 100000, 25).\n{ok,[{regular,{1556331,ok}},\n     {batchio,{682564,[{total,100000},{dropped,0}]}}]}\n3\u003e batchio_bench:run(40000, 4096, [noop], 1000000, 25).\n{ok,[{regular,{15140558,ok}},\n     {batchio,{7104603,[{total,1000000},{dropped,0}]}}]}\n4\u003e batchio_bench:run(40000, 4096, [noop], 1000000, 100).\n{ok,[{regular,{27556709,ok}},\n     {batchio,{21181553,[{total,1000000},{dropped,0}]}}]}\n5\u003e batchio_bench:run(40000, 4096*2, [noop], 1000000, 100).\n{ok,[{regular,{27357568,ok}},\n     {batchio,{21349706,[{total,1000000},{dropped,0}]}}]}\n```\n\nTable:\n\n- Buffer size: 40k entries\n- Page size: 4096 bytes\n- Io device: noop\n- Message bytes: (N*Size)\n\n| total bytes   | regular (µs)  | batchio (µs)  | speedup |\n|---------------|---------------|---------------|---------|\n| 250000        | 147307        | 82214         | 1.79    |\n| 2500000       | 1556331       | 682564        | 2.28    |\n| 25000000      | 15140558      | 7104603       | 2.13    |\n| 100000000     | 27556709      | 21181553      | 1.30    |\n\nThe speedup remains somewhat good the entire time through, and appears to scale\nsomewhat linearly for both IO methods. The difference between the two could\njust be overhead in message passing between the two, and a parallel/concurrent\ntest could reveal different results given all the sending of messages could be\ndone at once by many parties, rather than sequentially doing a request/response\npattern.\n\n\n### Multiple runs, {passthrough, user} ###\n\nRaw results (garbage output omitted):\n\n```\n6\u003e batchio_bench:run(40000, 4096, [{passthrough, user}], 10000, 25).\n{ok,[{regular,{663627,ok}},\n     {batchio,{116301,[{total,10000},{dropped,0}]}}]}\n7\u003e batchio_bench:run(40000, 4096, [{passthrough, user}], 100000, 25).\n{ok,[{regular,{7152307,ok}},\n     {batchio,{1193573,[{total,100000},{dropped,0}]}}]}\n8\u003e batchio_bench:run(40000, 4096, [{passthrough, user}], 100000, 100).\n{ok,[{regular,{6918967,ok}},\n     {batchio,{3158728,[{total,100000},{dropped,0}]}}]}\n9\u003e batchio_bench:run(40000, 4096*2, [{passthrough, user}], 100000, 100).\n{ok,[{regular,{7537927,ok}},\n     {batchio,{3248407,[{total,100000},{dropped,0}]}}]}\n10\u003e batchio_bench:run(40000, 4096 div 2, [{passthrough, user}], 100000, 100).\n{ok,[{regular,{6385057,ok}},\n     {batchio,{3027430,[{total,100000},{dropped,0}]}}]}\n```\n\nTable:\n\n- Buffer size: 40k entries\n- Page size: 4096 bytes except where noted\n- Io device: {passthrough, user}\n- Message bytes: (N*Size)\n\n| total bytes   | regular (µs)  | batchio (µs)  | speedup |\n|---------------|---------------|---------------|---------|\n| 250000        | 663627        | 116301        | 5.71    |\n| 2500000       | 7152307       | 1193573       | 5.99    |\n| 10000000      | 6918967       | 3158728       | 2.19    |\n| 10000000*     | 7537927       | 3248407       | 2.32    |\n| 10000000**    | 6385057       | 3027430       | 2.10    |\n\n\\* page size of 8192 bytes\n\\*\\* page size of 2048 bytes\n\nOn smaller output sizes, batchio is much faster there, telling us there might\nbe a blocking component further away than the dummy IO server, past the real IO\nsystem of the node (`user` outputs for real). At around 10000000 bytes, both\nthe regular and batchio times seem to go somewhat stable on the speedup despite\nthe page size, possibly pointing to the system's limit for IO, although more\nresults might be needed to confirm that\n\n### Making the buffer size smaller to check for load-shedding ###\n\nRaw results (garbage output omitted):\n\n```\n11\u003e  batchio_bench:run(5000, 4096, [{passthrough, user}], 100000, 100).\n{ok,[{regular,{7074854,ok}},\n     {batchio,{3132005,[{total,100000},{dropped,0}]}}]}\n12\u003e batchio_bench:run(100, 4096, [{passthrough, user}], 100000, 100).\n{ok,[{regular,{6269205,ok}},\n     {batchio,{3057970,[{total,100000},{dropped,4834}]}}]}\n```\n\nEven if we're sending 100k messages rather fast, even on a 5k elements buffer,\nno message gets dropped (and we have a 2.25 speedup, on par with the previous\nresults). However, turning down the buffer size to 100 entries at most ends up\nshedding 4.8% of the messages sent, keeping a similar speedup.\n\nThe lossiness of batchio has to be tweaked to choose a lossiness vs. storage\nspace requirement adequate for the target node.\n\n\n### Possible improvements ###\n\n- allow for concurrent operations\n- report on memory usage or \"messages in flight\".\n- automate the runs of multiple batch sizes and reports, table building.\n- simulate tests on a busy node\n\n## Introspection\n\n`batchio` doesn't have any official introspection API, but the `batchio_serv`\nprocess will store incremental data in the process dictionary under the keys\n`total`, `sent`, and `dropped`. Using `erlang:process_info/2` on that process\nwill allow someone to inspect the lossiness of items in flight, in absolute terms.\n\n## Roadmap ##\n\n- Allow dynamic configuration changes in a proper API\n- Test configuration changes\n- Figure out multiple-encoding support. Right now, batchio assumes that all\n  output will use the same final encoding by just calling `io_lib:format/2`\n  at the call site, rather than at the IO server (which is faster, but possibly\n  wrong).\n\n## Contributing ##\n\nSend in a pull request including the changes, tests, and a description of what\nthe changes do (and why it is necessary).\n\nThe test suite as it is is a bit complex, as it sets up middlemen group leaders\nto forward IO properly.\n\n## Changelog ##\n\n- 1.0.0: Proper `page_size` overflow handling. In 0.1.0, a message larger than\n the page size would have never made it through and would have needed to be\n dropped from the buffer (which would have needed to fill up) before message\n output could resume. This version makes it so that when a message too large is\n found, the current page is sent, and then the message larger than the\n `page_size` is sent alone.\n- 0.1.0: Initial commit\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fferd%2Fbatchio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fferd%2Fbatchio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fferd%2Fbatchio/lists"}