{"id":13507474,"url":"https://github.com/Nebo15/bsoneach","last_synced_at":"2025-03-30T09:32:57.256Z","repository":{"id":57481034,"uuid":"65019938","full_name":"Nebo15/bsoneach","owner":"Nebo15","description":"Elixir package that applies a function to each document in a BSON file.","archived":false,"fork":false,"pushed_at":"2020-03-04T16:56:26.000Z","size":4759,"stargazers_count":9,"open_issues_count":1,"forks_count":6,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-21T12:18:18.851Z","etag":null,"topics":["bson","elixir","elixir-lang","hex","package","parse","stream"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nebo15.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-05T13:19:13.000Z","updated_at":"2023-09-05T12:52:11.000Z","dependencies_parsed_at":"2022-09-26T17:41:26.655Z","dependency_job_id":null,"html_url":"https://github.com/Nebo15/bsoneach","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nebo15%2Fbsoneach","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nebo15%2Fbsoneach/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nebo15%2Fbsoneach/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nebo15%2Fbsoneach/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nebo15","download_url":"https://codeload.github.com/Nebo15/bsoneach/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246301963,"owners_count":20755512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bson","elixir","elixir-lang","hex","package","parse","stream"],"created_at":"2024-08-01T02:00:34.551Z","updated_at":"2025-03-30T09:32:55.787Z","avatar_url":"https://github.com/Nebo15.png","language":"Elixir","funding_links":[],"categories":["BSON"],"sub_categories":[],"readme":"# BSONEach\n\n[![Deps Status](https://beta.hexfaktor.org/badge/all/github/Nebo15/bsoneach.svg)](https://beta.hexfaktor.org/github/Nebo15/bsoneach) [![Hex.pm Downloads](https://img.shields.io/hexpm/dw/bsoneach.svg?maxAge=3600)](https://hex.pm/packages/bsoneach) [![Latest Version](https://img.shields.io/hexpm/v/bsoneach.svg?maxAge=3600)](https://hex.pm/packages/bsoneach) [![License](https://img.shields.io/hexpm/l/bsoneach.svg?maxAge=3600)](https://hex.pm/packages/bsoneach) [![Build Status](https://travis-ci.org/Nebo15/bsoneach.svg?branch=master)](https://travis-ci.org/Nebo15/bsoneach) [![Coverage Status](https://coveralls.io/repos/github/Nebo15/bsoneach/badge.svg?branch=master)](https://coveralls.io/github/Nebo15/bsoneach?branch=master) [![Ebert](https://ebertapp.io/github/Nebo15/bsoneach.svg)](https://ebertapp.io/github/Nebo15/bsoneach)\n\nThis module aims on reading large BSON files with low memory consumption. It provides single ```BSONEach.each(func)``` function that will read BSON file and apply callback function ```func``` to each parsed document.\n\nFile is read by 4096 byte chunks, BSONEach iterates over all documents till the end of file is reached.\n\nAlso you can use ```BSONEach.stream(path)``` if you want to read file as IO stream, which is useful when you use GenStage behavior.\n\n## Performance\n\n  * This module archives low memory usage (on my test environment it's constantly consumes 28.1 Mb on a 1.47 GB fixture with 1 000 000 BSON documents).\n  * Correlation between file size and parse time is linear. (You can check it by running ```mix bench```).\n\n    ```\n    $ mix bench\n    Settings:\n      duration:      1.0 s\n\n    ## IterativeBench\n    [17:36:14] 1/8: read and iterate 1 document\n    [17:36:16] 2/8: read and iterate 30 documents\n    [17:36:18] 3/8: read and iterate 300 documents\n    [17:36:20] 4/8: read and iterate 30_000 documents\n    [17:36:21] 5/8: read and iterate 3_000 documents\n    ## StreamBench\n    [17:36:22] 6/8: stream and iterate 300 documents\n    [17:36:24] 7/8: stream and iterate 30_000 documents\n    [17:36:25] 8/8: stream and iterate 3_000 documents\n\n    Finished in 13.19 seconds\n\n    ## IterativeBench\n    benchmark name                       iterations   average time\n    read and iterate 1 document              100000   15.54 µs/op\n    read and iterate 30 documents             50000   22.63 µs/op\n    read and iterate 300 documents              100   13672.39 µs/op\n    read and iterate 3_000 documents             10   127238.70 µs/op\n    read and iterate 30_000 documents             1   1303975.00 µs/op\n    ## StreamBench\n    benchmark name                       iterations   average time\n    stream and iterate 300 documents            100   14111.38 µs/op\n    stream and iterate 3_000 documents           10   142093.60 µs/op\n    stream and iterate 30_000 documents           1   1429789.00 µs/op\n    ```\n\n  * It's better to pass a file to BSONEach instead of stream, since streamed implementation works so much slower.\n  * BSONEach is CPU-bounded. Consumes 98% of CPU resources on my test environment.\n  * (```time``` is not a best way to test this, but..) on large files BSONEach works almost 2 times faster comparing to loading whole file in memory and iterating over it:\n\n    Generate a fixture:\n\n    ```bash\n    $ mix generate_fixture 1000000 test/fixtures/1000000.bson\n    ```\n\n    Run different task types:\n\n    ```bash\n    $ time mix count_read test/fixtures/1000000.bson\n    Compiling 2 files (.ex)\n    \"Done parsing 1000000 documents.\"\n    mix print_read test/fixtures/1000000.bson  59.95s user 5.69s system 99% cpu 1:05.74 total\n    ```\n\n    ```bash\n    $ time mix count_stream test/fixtures/1000000.bson\n    Compiling 2 files (.ex)\n    Generated bsoneach app\n    \"Done parsing 1000000 documents.\"\n    mix count_stream test/fixtures/1000000.bson  45.37s user 2.74s system 102% cpu 46.876 total\n    ```\n\n  * This implementation works faster than [timkuijsten/node-bson-stream](https://github.com/timkuijsten/node-bson-stream) NPM package (we comparing with Node.js on file with 30k documents):\n\n    ```bash\n    $ time mix count_stream test/fixtures/30000.bson\n    \"Done parsing 30000 documents.\"\n    mix count_stream test/fixtures/30000.bson  1.75s user 0.35s system 114% cpu 1.839 total\n    ```\n\n    ```bash\n    $ time node index.js\n    Read 30000 documents.\n    node index.js  2.09s user 0.05s system 100% cpu 2.139 total\n    ```\n\n## Installation\n\nIt's available on [hex.pm](https://hex.pm/packages/bsoneach) and can be installed as project dependency:\n\n  1. Add `bsoneach` to your list of dependencies in `mix.exs`:\n\n    ```elixir\n    def deps do\n      [{:bsoneach, \"~\u003e 0.4.1\"}]\n    end\n    ```\n\n  2. Ensure `bsoneach` is started before your application:\n\n    ```elixir\n    def application do\n      [applications: [:bsoneach]]\n    end\n    ```\n\n## How to use\n\n  1. Open file and pass iostream to a ```BSONEach.each(func)``` function:\n\n    ```elixir\n    \"test/fixtures/300.bson\" # File path\n    |\u003e BSONEach.File.open # Open file in :binary, :raw, :read_ahead modes\n    |\u003e BSONEach.each(\u0026process_bson_document/1) # Send IO.device to BSONEach.each function and pass a callback\n    |\u003e File.close # Don't forget to close referenced file\n    ```\n\n  2. Callback function should receive a struct:\n\n    ```elixir\n    def process_bson_document(%{} = document) do\n      # Do stuff with a document\n      IO.inspect document\n    end\n    ```\n\nWhen you process large files its a good thing to process documents asynchronously, you can find more info [here](http://elixir-lang.org/docs/stable/elixir/Task.html).\n\n## Thanks\n\nI want to thank to @ericmj for his MongoDB driver. All code that encodes and decodes to with BSON was taken from his repo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNebo15%2Fbsoneach","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNebo15%2Fbsoneach","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNebo15%2Fbsoneach/lists"}