{"id":49271595,"url":"https://github.com/nao1215/datastream","last_synced_at":"2026-05-16T10:02:07.503Z","repository":{"id":353330202,"uuid":"1218772355","full_name":"nao1215/datastream","owner":"nao1215","description":"Compositional, resource-safe streams for Gleam — runs on Erlang and JavaScript targets","archived":false,"fork":false,"pushed_at":"2026-05-07T07:52:40.000Z","size":389,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-07T09:40:02.750Z","etag":null,"topics":["gleam","library","stream"],"latest_commit_sha":null,"homepage":"https://hexdocs.pm/datastream","language":"Gleam","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nao1215.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"nao1215"}},"created_at":"2026-04-23T07:47:50.000Z","updated_at":"2026-05-07T07:52:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nao1215/datastream","commit_stats":null,"previous_names":["nao1215/datastream"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/nao1215/datastream","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fdatastream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fdatastream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fdatastream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fdatastream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nao1215","download_url":"https://codeload.github.com/nao1215/datastream/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nao1215%2Fdatastream/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32843850,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-10T00:25:13.032Z","status":"online","status_checked_at":"2026-05-10T02:00:06.698Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gleam","library","stream"],"created_at":"2026-04-25T14:02:32.358Z","updated_at":"2026-05-10T04:01:19.025Z","avatar_url":"https://github.com/nao1215.png","language":"Gleam","funding_links":["https://github.com/sponsors/nao1215"],"categories":[],"sub_categories":[],"readme":"# datastream\n\ndatastream is a pull-based stream library for Gleam.\n\nA `Stream(a)` is a pipeline definition, not a materialized collection.\nEach terminal operation runs the source again from the beginning, so the\nlibrary fits work that should stay lazy, repeatable, and explicit about\neffects.\n\n## Install\n\n```sh\ngleam add datastream\n```\n\nAPI reference: \u003chttps://hexdocs.pm/datastream\u003e\n\n## When to use it\n\nUse `datastream` when you need one or more of these:\n\n- the input is large or unbounded and should not be loaded all at once\n- the pipeline owns a real resource such as a file handle, socket, or cursor\n- the work is naturally chunked text or bytes\n- the Erlang target needs bounded parallel work or time-based stream operators\n\nFor a hands-on tour, jump to the runnable\n[Log-ingest example](#log-ingest-example-bytes--lines--per-level-counts)\nor the [NDJSON example](#ndjson-example-chunked-bytes--typed-records).\nThe full catalog of compile-checked end-to-end pipelines lives under\n[Example pipelines](#example-pipelines).\n\nStay with `gleam/list` when the whole input is already in memory and you\ndo not need lazy pulls, replayable pipelines, or deterministic cleanup.\n\n## Quick start\n\n```gleam\nimport datastream/sink\nimport datastream/source\nimport datastream/stream\nimport gleam/io\n\npub fn main() {\n  let result =\n    source.iterate(from: 1, with: fn(x) { x + 1 })\n    |\u003e stream.map(with: fn(x) { x * 2 })\n    |\u003e stream.take(up_to: 5)\n    |\u003e sink.to_list\n\n  io.debug(result)\n  // [2, 4, 6, 8, 10]\n}\n```\n\n## Log-ingest example: bytes -\u003e lines -\u003e per-level counts\n\nA common shape: bytes arrive in arbitrary chunks (file, socket), the\npipeline reassembles them into lines, drops malformed rows, and produces\nper-level counts in a single pass. `text.lines` owns the line buffering,\nso the chunk boundaries shown below never split a record across two\nemitted strings.\n\n```gleam\nimport datastream/fold\nimport datastream/source\nimport datastream/stream\nimport datastream/text\nimport gleam/dict\nimport gleam/io\nimport gleam/option.{type Option, None, Some}\nimport gleam/string\n\npub type Level {\n  Info\n  Warn\n  LogError\n}\n\npub fn main() {\n  // Real-world input arrives in arbitrary chunks. Note how `WARN` and\n  // `INFO` records straddle chunk boundaries — `text.lines` reassembles.\n  let chunks = [\n    \"INFO  user_id=42 logged in\\nWARN  \", \"user_id=42 retry\\nERROR \",\n    \"user_id=99 timeout\\nbogus line with no level\\nINFO \",\n    \"user_id=42 ok\\n\",\n  ]\n\n  source.from_list(chunks)\n  |\u003e text.lines\n  |\u003e stream.filter_map(with: parse_line)\n  |\u003e fold.fold(from: dict.new(), with: bump)\n  |\u003e io.debug\n  // dict.from_list([#(Info, 2), #(Warn, 1), #(LogError, 1)])\n}\n\nfn parse_line(line: String) -\u003e Option(Level) {\n  case string.split_once(line, on: \" \") {\n    Ok(#(\"INFO\", _)) -\u003e Some(Info)\n    Ok(#(\"WARN\", _)) -\u003e Some(Warn)\n    Ok(#(\"ERROR\", _)) -\u003e Some(LogError)\n    _ -\u003e None\n  }\n}\n\nfn bump(acc: dict.Dict(Level, Int), level: Level) -\u003e dict.Dict(Level, Int) {\n  let current = case dict.get(acc, level) {\n    Ok(n) -\u003e n\n    Error(Nil) -\u003e 0\n  }\n  dict.insert(acc, level, current + 1)\n}\n```\n\nProduction callers swap `source.from_list(chunks)` for a\n`source.resource` over a real file handle or socket — the rest of the\npipeline is unchanged. Full source:\n[`test/examples/log_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/log_pipeline_example.gleam).\n\n## NDJSON example: chunked bytes -\u003e typed records\n\nNewline-delimited JSON (one record per line) over a chunked byte source.\nThe same lazy pass does UTF-8 decode, line framing, and per-record\nparsing without holding the whole payload in memory.\n\n```gleam\nimport datastream/fold\nimport datastream/source\nimport datastream/stream\nimport datastream/text\nimport gleam/int\nimport gleam/io\nimport gleam/string\n\npub type Record {\n  Record(id: Int, body: String)\n}\n\npub type ParseError {\n  EmptyLine\n  MissingId(line: String)\n  BadId(raw: String)\n}\n\npub fn main() {\n  // The first chunk ends mid-line; the second completes that record.\n  let chunks =\n    source.from_list([\n      \u003c\u003c\"1 first record\\n2 second\"\u003e\u003e,\n      \u003c\u003c\" record\\n3 third record\\n\"\u003e\u003e,\n    ])\n\n  chunks\n  |\u003e text.utf8_decode_lossy\n  |\u003e text.lines\n  |\u003e stream.map(with: parse_record)\n  |\u003e fold.collect_result\n  |\u003e io.debug\n  // Ok([Record(1, \"first record\"), Record(2, \"second record\"), ...])\n}\n\nfn parse_record(line: String) -\u003e Result(Record, ParseError) {\n  case line {\n    \"\" -\u003e Error(EmptyLine)\n    _ -\u003e\n      case string.split_once(line, on: \" \") {\n        Error(_) -\u003e Error(MissingId(line: line))\n        Ok(#(raw_id, body)) -\u003e\n          case int.parse(raw_id) {\n            Ok(id) -\u003e Ok(Record(id: id, body: body))\n            Error(_) -\u003e Error(BadId(raw: raw_id))\n          }\n      }\n  }\n}\n```\n\n`fold.collect_result` short-circuits at the first parse failure; swap in\n`fold.partition_result` to surface every successful record alongside\nevery error. Drop in your JSON decoder of choice at `parse_record`.\nFull source:\n[`test/examples/ndjson_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/ndjson_pipeline_example.gleam).\n\n## Resource-backed streams\n\n`source.resource` opens lazily on the first pull, yields values one by\none, and closes exactly once on normal completion and on the early-stop\npaths the library controls.\n\n```gleam\nimport datastream.{Done, Next}\nimport datastream/fold\nimport datastream/source\nimport datastream/stream\nimport gleam/io\n\npub fn main() {\n  let numbers =\n    source.resource(\n      open: fn() { 1 },\n      next: fn(state) {\n        case state \u003c= 3 {\n          True -\u003e Next(element: state, state: state + 1)\n          False -\u003e Done\n        }\n      },\n      close: fn(_state) { Nil },\n    )\n\n  numbers\n  |\u003e stream.take(up_to: 2)\n  |\u003e fold.to_list\n  |\u003e io.debug\n  // [1, 2]\n}\n```\n\nUse `source.try_resource` when opening or reading can fail and you want\nthe failure on the typed path as `Result`.\n\n## JavaScript async I/O\n\nThe core `Stream(a)` stays synchronous on both targets. That is a\ndeliberate design choice: each pull either returns the next element\nimmediately or reports `Done`.\n\nBecause of that, the official JavaScript boundary is:\n\n- async input stays in host JavaScript until it has been reduced to a bounded value or batch, then enters `datastream` through `source.once`, `source.from_list`, `source.from_bit_array`, or a resource constructor\n- synchronous `Stream(a)` pipelines leave `datastream` through `datastream/javascript/async.to_async_iterable`\n\nThis means `datastream` does not pretend that a host-side\n`AsyncIterable` is the same thing as a pure `Stream(a)`. The adapter is\nhonest about where `await` lives.\n\nGleam:\n\n```gleam\nimport datastream/javascript/async as js_async\nimport datastream/source\nimport datastream/stream\nimport gleam/string\n\npub fn lines_for_host() -\u003e js_async.AsyncIterable(String) {\n  source.from_list([\"a\", \"b\", \"c\"])\n  |\u003e stream.map(with: string.uppercase)\n  |\u003e js_async.to_async_iterable\n}\n```\n\nHost JavaScript:\n\n```js\nimport { lines_for_host } from \"./build/dev/javascript/app/app.mjs\";\n\nfor await (const line of lines_for_host()) {\n  console.log(line);\n  if (line === \"B\") break;\n}\n```\n\nBreaking out of the `for await` loop closes the underlying stream once,\nso resource-backed pipelines still release handles promptly.\n\n## Example pipelines\n\nCompile-checked examples live under `test/examples/` and run in CI. The\nlog-ingest and NDJSON entries are inlined above; the rest are linked\nstraight to source.\n\n| Example | Shape |\n| --- | --- |\n| [`log_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/log_pipeline_example.gleam) | bytes -\u003e `text.lines` -\u003e validation -\u003e per-level counts |\n| [`ndjson_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/ndjson_pipeline_example.gleam) | bytes -\u003e UTF-8 decode -\u003e lines -\u003e per-record parse |\n| [`length_prefixed_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/length_prefixed_pipeline_example.gleam) | chunked bytes -\u003e `binary.length_prefixed_with` -\u003e collection |\n| [`parallel_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/parallel_pipeline_example.gleam) | BEAM-only bounded parallel map |\n| [`dataprep_pipeline_example.gleam`](https://github.com/nao1215/datastream/blob/main/test/examples/dataprep_pipeline_example.gleam) | per-row validation with accumulated errors |\n\n## Module guide\n\n- `datastream`: defines `Stream(a)` and `Step(a, state)`\n- `datastream/source`: constructors for list-backed, generated, and resource-backed streams\n- `datastream/stream`: lazy combinators such as `map`, `filter`, `flat_map`, `zip`, `take`, and `chunks_of`\n- `datastream/sink`: every terminal — pure reductions (`to_list`, `count`, `fold`, `first`, `find`, `collect_result`, …) and side-effecting consumers (`each`, `try_each`, `println`)\n- `datastream/fold`: legacy alias for the pure-reduction subset of `sink`. Re-exports the same functions for backward compatibility; new code should reach for `datastream/sink` instead\n- `datastream/chunk`: opaque finite batches\n- `datastream/text`: chunk-aware UTF-8 decode and line splitting\n- `datastream/binary`: byte and framing helpers\n- `datastream/erlang/source`: BEAM-only subject, timer, and timeout helpers\n- `datastream/erlang/sink`: BEAM-only subject sink\n- `datastream/erlang/par`: BEAM-only bounded parallel combinators and `race`\n- `datastream/erlang/time`: BEAM-only time-based combinators\n- `datastream/javascript/async`: JavaScript-only async iterable adapter for leaving the synchronous core\n\n## Target support\n\n- Erlang target: the full package\n- JavaScript target: the cross-target core and `datastream/javascript/async`\n- `datastream/erlang/*` modules are BEAM-only\n\n## Checked constructors\n\nSome constructors reject invalid numeric arguments at construction time.\nUse the panicking variants when the value is a trusted constant in your\nown code. Use the matching `*_checked` variant when the value comes from\nCLI flags, config files, request parameters, or any other dynamic input.\n\nThe main checked families are:\n\n- `stream.take_checked`, `stream.drop_checked`\n- `stream.buffer_checked`, `stream.chunks_of_checked`\n- `binary.length_prefixed_checked`, `binary.length_prefixed_with_checked`, `binary.fixed_size_checked`\n- `datastream/erlang/par.*_checked`\n\n## Backpressure\n\nTwo combinators interact with backpressure between a producer and one\nor more consumers:\n\n- **`stream.buffer(stream, prefetch:)`** prefetches up to `prefetch`\n  elements ahead of the consumer, so a latency-bound upstream (HTTP body\n  bytes, slow disk reads) does not serialise the consumer's per-element\n  work. The buffer is bounded by `prefetch`; a slow consumer cannot\n  blow it past that capacity.\n- **`stream.broadcast(stream, n)`** fans one source out to `n`\n  independent consumers, each pulling at its own pace. Per-consumer\n  queues are **unbounded**: if one consumer pulls aggressively while\n  another pauses, the slow consumer's queue grows by the per-consumer\n  pull-distance. The worst-case memory footprint is\n  `O(max_pull_distance × n)`. For cardinality-unbounded sources\n  (`source.iterate`, `source.repeat`) this is a silent memory hazard.\n- **`stream.broadcast_bounded(stream, n, max_queue:)`** is the\n  bounded variant: any consumer queue exceeding `max_queue` triggers\n  a structured panic on the next upstream pull. Use it in production\n  fan-outs (HTTP multicast, websocket pub/sub, Kafka producer tees,\n  ...) where a stalled slow consumer must surface as a crash instead\n  of an OOM.\n\nIf you only have one consumer and just want to overlap producer work\nwith consumer work, reach for `buffer`. If you genuinely need fan-out\nto multiple consumers and your source is bounded, `broadcast` is fine.\nIf the source is unbounded or production-critical, default to\n`broadcast_bounded`.\n\n## Web framework compatibility\n\n`datastream` depends on `gleam_erlang \u003e= 1.3.0` and\n`gleam_stdlib \u003e= 0.44.0`. Older releases of some web packages pin\n`gleam_erlang \u003c 1.0.0`, which conflicts with `datastream`.\n\nUse these versions or newer when combining `datastream` with a web stack:\n\n| Package | Minimum compatible version |\n| --- | --- |\n| `wisp` | `\u003e= 2.0.0` |\n| `mist` | `\u003e= 6.0.0` |\n| `gleam_httpc` | `\u003e= 5.0.0` |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnao1215%2Fdatastream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnao1215%2Fdatastream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnao1215%2Fdatastream/lists"}