{"id":43768817,"url":"https://github.com/smartpricing/alyxstream","last_synced_at":"2026-02-05T16:09:11.408Z","repository":{"id":188891070,"uuid":"676082863","full_name":"smartpricing/alyxstream","owner":"smartpricing","description":"Alyxstream is a library that simplify stream processing in Node.js. We use it in production to make real time logs analysis, errors detection, parallel job processing, using Kafka, Redis and Nats as sources","archived":false,"fork":false,"pushed_at":"2025-06-18T08:52:14.000Z","size":1620,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-10-21T03:45:59.233Z","etag":null,"topics":["library"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smartpricing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-08T11:50:57.000Z","updated_at":"2025-06-18T08:52:18.000Z","dependencies_parsed_at":"2024-01-16T11:39:19.195Z","dependency_job_id":"aaad1176-c034-48dd-8e35-924afc983e94","html_url":"https://github.com/smartpricing/alyxstream","commit_stats":null,"previous_names":["smartpricing/alyxstream"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/smartpricing/alyxstream","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartpricing%2Falyxstream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartpricing%2Falyxstream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartpricing%2Falyxstream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartpricing%2Falyxstream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smartpricing","download_url":"https://codeload.github.com/smartpricing/alyxstream/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartpricing%2Falyxstream/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29125132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T14:05:12.718Z","status":"ssl_error","status_checked_at":"2026-02-05T14:03:53.078Z","response_time":65,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["library"],"created_at":"2026-02-05T16:09:10.410Z","updated_at":"2026-02-05T16:09:11.393Z","avatar_url":"https://github.com/smartpricing.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Alyxstream\n\nAlyxstream is a library that simplify stream processing in Node.js. We use it in production to make real time logs analysis, errors detection, parallel job processing, using mainly Kafka as source and Cassandra and Redis as sinks. Although it's not perfect and still under active development, this library could help you to solve a lot of processing problems, with a nice dataflow syntax.\n\nOut-of-the-box sources/sinks:\n\n- Kafka\n- Redis\n- Cassandra\n- Nats Jetstream\n- Etcd\n\nWorking usage examples are in the *usage-examples* folder.\n\n## Table of contents\n\n1. [Introduction](#introduction)\n2. [Stream/Batch sources](#sources)\n3. [Operators](#operators)\n4. [Custom functions](#custom)\n5. [Window processing](#windows)\n6. [Kafka source and sink](#kafka)\n7. [Redis queues](#redisqueue)\n8. [Storage](#storage)\n9. [Extend the library](#extend)\n10. [Kafka Exchange Mode](#exchange)\n11. [Multiprocess/Parallel Mode](#parallel)\n12. [Nats JetStream](#jetstream)\n13. [Distributed locks](#locks)\n\n## Introduction \u003ca name=\"introduction\"\u003e\u003c/a\u003e\n\nInstall it:\n\n```sh\nnpm install @dev.smartpricing/alyxstream\n```\n\n```js\nimport { \n\tTask, \n\tMakeStorage, \n\tStorageKind, \n\tKafkaClient, \n\tKafkaSource \n} from '@dev.smartpricing/alyxstream'\n\nconst kafkaSource = KafkaSource(KafkaClient())\n\nawait Task()\n.withLocalKVStorage()\n.withStorage(MakeStorage(StorageKind.Cassandra, null, 'hotel.errors.count'))\n.fromKafka(kafkaSource)\n.setLocalKV('kafka-mex', x =\u003e x)\n.withEventTime(x =\u003e x.eventTime)\n.keyBy(x =\u003e x.partition)\n.map(x =\u003e x.value)\n.filter(x =\u003e x.warningLevel == 'error')\n.slidingWindowTime(MakeStorage(StorageKind.Redis, null, 'my.window'), 5 * 60 * 1000, 60 * 1000)\n.groupBy(x =\u003e x.hotelName)\n.sumMap()\n.toStorage(x =\u003e 'result', x =\u003e x)\n.getLocalKV('kafka-mex')\n.kafkaCommit(kafkaSource)\n.close()\n```\n\n## Stream/batch sources  \u003ca name=\"sources\"\u003e\u003c/a\u003e\n\nAlyxstream supports multiple sources by default, both for streaming and batch processing. It'also very easy to build your custom sources.\n\nAlways import Task, and Kafka Client/Source if you need Kafka support.\n\n\n```js\nimport { \n\tTask, \t\n\tKafkaClient, \n\tKafkaSource\n} from '@dev.smartpricing/alyxstream'\n```\n\n*For every Task, remember to call the **close** method at the end of the task pipeline. The close method signal the Task that it can start to process the data stream*\n\nArray source, the downstream pipeline is called for every element of the array:\n\n```js\nawait Task().fromArray([1,2,3]).close()\n```\n\nObject source, the downstream pipeline is called once:\n\n```js\nawait Task().fromObject([1,2,3]).close()\nawait Task().fromObject({name: 'alice'}).close()\n```\n\nString source:\n\n```js\nawait Task().fromString('Alice').close()\n```\n\nFrom readable stream:\n\n```js\nawait Task().fromReadableStream('/path/to/file.csv').close()\n// With integrated unzip\nawait Task().fromReadableStream('/path/to/file.csv.gz', true).close()\n```\n\nFrom Kafka [The Kafka client/source/sink it's exaplained well below]\n\n```js\nconst kafkaSource = await KafkaSource(KafkaClient({\n\tclientId: 'clientId'\n}), {\n\tgroupId: 'groupId',\n\ttopics: ['mytopic']\n})\nawait Task().fromKafka(kafkaSource).close()\n```\n\nYou can also define a Task without a source, and then inject the payload\n\n*For the inject source, the **close** function it's not needed*\n\n```js\nconst task = await Task().print('\u003e')\n\nfor (var i = 0; i \u003c 10; i += 1) {\n\tawait task.inject(i)\n}\n```\n\nTo get back the last result of a Task, use the finalize method:\n\n```js\nconst t = await Task().fromString('Alice').close()\n\nconst result = await t.finalize()\n```\n\n## Operators \u003ca name=\"operators\"\u003e\u003c/a\u003e\n\n### Base operators\n\nAlyxstream support keyed stream processing. In case you don't need it, you can set the key to *default* with the *withDefaultKey()* operator.\n\n```js\nawait Task().withDefaultKey()\n```\n\nInstead if you need a keyed processing (like in windows), you have to set it with the *keyBy* operator,\nusually after the source operator. \n\n```js\nawait Task().keyBy(x =\u003e x.myKey)\nawait Task().keyBy(x =\u003e 'customKey')\n```\n\nIf you want to use event time based processing, you have to specify it, usually after the source operator.\nIf not specified, processing time is used.\n\n```js\nawait Task().withEventTime(x =\u003e new Date(x.originalDate))\n```\n\nYou can stop the pipeline execution for a message that is not conforming with your needs:  \n\n```js\nawait Task().filter(x =\u003e x % 2 == 0)\n```\nAnd in every pipeline step, you can print the current operator state:\n\n```js\nawait Task().print('1 \u003e')\n```\n\n### Branch operator\n\nThe brach operator helps you to build DAG flow processing (the syntax will be improved):\n\n```js\nawait Task()\n.fromArray([1,2,3])\n.branch([\n\tasync () =\u003e { return await Task().print('1 \u003e') },\n\tasync () =\u003e { return await Task().print('2 \u003e') }\n])\n.close()\n```\n\n### Array operators\n\nArray operators help you to transform array data:\n\n*map* take an input array and apply the map function to every element:\n\n```js\nawait Task()\n.fromObject([1,2,3])\n.map(x =\u003e x * 2)\n.print() // [2,4,6] \n.close()\n```\n\n*each* will call the downstream pipeline steps for every array argument (the same of *fromArray* source operator)\n\n```js\nawait Task()\n.fromObject([1,2,3])\n.each()\n.print() \n.close()\n// 1\n// 2\n// 3\n```\n\n*groupBy* take an array, and returns an object.\n\n```js\nawait Task()\n.fromObject([ {name: 'Alice'}, {name: 'Andrea'}, {name: 'Paolo'}, {name: 'Alice'} ])\n.groupBy(x =\u003e x.name)\n.print() \n.close()\n// {\n// \tAlice: [{name: 'Alice'}, {name: 'Alice'}],\n// \tPaolo: [{name: 'Paolo'}],\n// \tAndrea: [{name: 'Andrea'}]\n// }\n```\n\n### Object operators\n\n*sumMap* take an object and count the array elements for every key.\n\n```js\nawait Task()\n.fromObject([ {name: 'Alice'}, {name: 'Andrea'}, {name: 'Paolo'}, {name: 'Alice'} ])\n.groupBy(x =\u003e x.name)\n.sumMap()\n.print() \n.close()\n// {\n// \tAlice: 2,\n// \tPaolo: 1,\n// \tAndrea: 1\n// }\n```\n\n### Aggregate\n\n```js\nconst storage = MakeStorage(StorageKind.Memory, null, 'example')\n\nconst t = await Task()\n.fromArray(['hello', 'hello', 'alice'])\n.aggregate(storage, 'ex', x =\u003e x)\n.sumMap()\n.print('\u003e step result:')\n```\n\n## Custom functions \u003ca name=\"custom\"\u003e\u003c/a\u003e\n\nYou can of course use any JS custom made function to process your data. You have to wrap your code inside a function (or an async function)\n\nWith synchronous functions:\n\n```js\nawait Task()\n.fromArray([1,2,3])\n.fn(x =\u003e {\n\t// x will be 1, then 2, then 3 \n\treturn x * 2\n})\n.close()\n```\n\nor asynchronous:\n\n```js\nawait Task()\n.fromArray([1,2,3])\n.fn(async x =\u003e {\n\treturn await asyncFunctionYouHaveToCall(x)\n})\n.close()\n```\n\nThe **x** callback variable is the data flowing in the pipeline (your payload).\n\nAlyxstream wraps your payload inside an internal datastructure, that is an object:\n\n```js\nMessage = {\n\tpayload: 'YOUR_PAYLOAD',\n\tmetadata: {}, // Where keyBy, eventTime metadata are keept in memory,\n\tglobalState: {} // Here you can place state that must persist within steps\n}\n```\n\nIn case you need to access the raw message with your custom functions, uses the *fnRaw* variant:\n\n```js\nawait Task()\n.fromArray([1,2,3])\n.fnRaw(x =\u003e {\n\t// x = {payload: 1, metadata: {}, globalState: {}}\n\tx.payload *= 2\n\treturn x\n})\n.close()\n```\n\n## Window processing \u003ca name=\"windows\"\u003e\u003c/a\u003e\n\nAlyxstream has five kind of windows:\n\n- tumblingWindowCount: (Storage, WindowElementsLength, MaxInactivityMilliseconds [optional])\n- tumblingWindowTime: (Storage, WindowTimeMsLength, MaxInactivityMilliseconds [optional])\n- slidingWindowCount: (Storage, WindowElementsLength, SlideLength, MaxInactivityMilliseconds [optional])\n- slidingWindowTime: (Storage, WindowTimeMsLength, SlideMs, MaxInactivityMilliseconds [optional])\n- sessionWindowTime: (Storage, MaxInactivityMilliseconds)\n\nWindow's state storage can be in memory or inside Redis (or Redis compatible DB like KVRocks) instance. Every window has an *inactivity* time period, to emit the window result even if no events can trigger it.\n\nIn time based windows, there is a watermark concept, so when using event time processing, later records are not allowed (a grace period will be implemented soon).\n\nWindows split stream based on the key you have defined (*keyBy* operator).\nThey flush the storage every time a window is ready to be emitted.\n\nBase config:\n\n```js\nimport { \n\tTask, \n\tMakeStorage, \n\tStorageKind\n} from '@dev.smartpricing/alyxstream'\n\nconst redisConfig = {} // default to localhost:6379\nconst exampleWindowStorage = MakeStorage(StorageKind.Redis, redisConfig, 'windowStorageId')\n```\n\n*tumblingWindowCount*, with 100 elements length and 10 seconds max inactivity time:\n\n```js\nawait Task()\n.fromKafka(...)\n.tumblingWindowCount(exampleWindowStorage, 100, 10000)\n.close()\n```\n\n*tumblingWindowTime*, 1 minute length and 10 seconds max inactivity time:\n\n```js\nawait Task()\n.fromKafka(...)\n.tumblingWindowTime(exampleWindowStorage, 60000, 10000)\n.close()\n```\n\n*slidingWindowCount*, with 100 elements length, 25 elements slide, and 10 seconds max inactivity time:\n\n```js\nawait Task()\n.fromKafka(...)\n.slidingWindowCount(exampleWindowStorage, 100, 25, 10000)\n.close()\n```\n\n*slidingWindowTime*, 1 minute length, 5 seconds slide, and 10 seconds max inactivity time:\n\n```js\nawait Task()\n.fromKafka(...)\n.slidingWindowTime(exampleWindowStorage, 60000, 5000, 10000)\n.close()\n```\n\n*sessionWindowTime*, max 15 seconds inactivity\n\n```js\nawait Task()\n.fromKafka(...)\n.sessionWindowTime(exampleWindowStorage, 150000)\n.close()\n```\n\n## Kafka source and sink \u003ca name=\"kafka\"\u003e\u003c/a\u003e\n\nKafkaClient:\n\n```js\nimport { \n\tKafkaClient\n} from '@dev.smartpricing/alyxstream'\n\nconst kafkaClient = KafkaClient({\n\tclientId: 'my-client'\n\tbrokers: ['localhost:9092'], \n  \tssl: false,\n  \tsasl: ({ // for Confluent access\n  \t    mechanism: 'plain',\n  \t    username: 'username',\n  \t    password: 'password'\n  \t})\t\n})\n```\n\nKafkaSource:\n\n```js\nimport { \n\tKafkaSource\n} from '@dev.smartpricing/alyxstream'\n\nconst topic = {\n\ttopic: 'my-topic',\n\tfromBeginning: false,\n\tautoCommit: false,\n\tautoHeartbeat: 5000\n}\n\nconst kafkaSource = KafkaSource(kafkaClient, {\n\tgroupId: 'my-group-id',\n\ttopics: [topic]\t\n})\n```\n\n## Redis queue \u003ca name=\"redisqueue\"\u003e\u003c/a\u003e\n\nRedis queue use Redis list with BRPOP command in order to distrubute jobs between workers:\n\n```js\nimport { Task, MakeStorage, StorageKind } from '@dev.smartpricing/alyxstream'\n\nconst queueStorage = MakeStorage(StorageKind.Redis, null, 'my-queue')\n\nasync function producer () {\n\tconst t = await Task()\n\t.fromReadableStream('data.csv.gz', true)\n\t.readline()\n\t.tumblingWindowCount(MakeStorage(StorageKind.Memory, null, 'my-queue-win'), 10)\n\t.enqueue(queueStorage)\n\t.fn(async (x) =\u003e {\n\t\twhile (true) {\n\t\t\tconst queueSize = await queueStorage.queueSize()\t\n\t\t\tif (queueSize \u003e 100) {\n\t\t\t\tawait new Promise(resolve =\u003e setTimeout(resolve, 1000))\n\t\t\t} else {\n\t\t\t\tbreak\n\t\t\t}\n\t\t}\n\t})\n\t.close()\n}\n\nasync function consumer () {\n\tconst t = await Task()\n\t.dequeue(queueStorage)\n\t.fn(async x =\u003e {\n\t\tconsole.log(x)\n\t\treturn x\n\t})\n\t.close()\n} \n\nasync function run () {\n\tif (process.env.RUN == 'producer') {\n\t\tawait producer()\t\n\t} else {\n\t\tawait consumer()\n\t}\n}\n\nrun()\n```\n\n## Storage \u003ca name=\"storage\"\u003e\u003c/a\u003e\n\nAlyxstream supports out of the box three kind of storage: memory, Redis and Cassandra.\nMemory and Redis storage are suitable for windows state storage.\n\n```js\nimport { MakeStorage, StorageKind } from '@dev.smartpricing/alyxstream'\n\nconst memStorage = MakeStorage(StorageKind.Memory, null, 'storage-1')\n\nconst redisStorage = MakeStorage(StorageKind.Redis, null, 'storage-1')\n\nconst cassandraStorage = MakeStorage(StorageKind.Cassandra, null, 'storage-1')\n```\n\nAccessing raw client:\n\n```js\n\nconst redisStorage = MakeStorage(StorageKind.Redis, null, 'storage-1')\nconst redisStorageClient = redisStorage.db() // io-redis\n\n\nconst cassandraStorage = MakeStorage(StorageKind.Cassandra, null, 'storage-1')\nconst cassandraStorageClient = cassandraStorage.db() // cassandra-driver\n```\n\n### Internal KV storage\n\n\n```js\nimport { Task } from '@dev.smartpricing/alyxstream'\n\nawait Task()\n.withLocalKVStorage()\n.setLocalKV('my-var', x =\u003e x * 2)\n.getLocalKV('my-var')\n.mergeLocalKV('my-var')\n```\n\n## Extend the library \u003ca name=\"extend\"\u003e\u003c/a\u003e\n\nYou can create your custom functions and call the functions from a task:\n\n```js\nimport { Task, ExtendTask, ExtendTaskRaw } from '@dev.smartpricing/alyxstream'\n\nexport const multiplyBy = ExtendTask('multiplyBy', async function (x, multiplier) {\n    return x * multiplier\n})\n\nexport const multiplyByRaw = ExtendTaskRaw('multiplyByRaw', async function (x, multiplier) {\n    return x.payload * multiplier\n})\n\nawait Task()\n.multiplyBy(2)\n.multiplyByRaw(2)\n.inject(3) // output 12\n```\n\n## Kafka Exchange Mode \u003ca name=\"exchange\"\u003e\u003c/a\u003e\n\nAlyxstream contains a wrapper around itself in order to simplify a special use case of Kafka communication between microservices. It's called Exchange and allow bidirectional communication between two (or more) services.\n\n```js\nimport { \n\tExchange,\n\tKafkaClient\n} from '@dev.smartpricing/alyxstream'\n\nconst client = KafkaClient({\n\tclientId: 'my-client-1',\n\tbrokers: ['localhost:9092'], \t\t\n})\n\nlet mex = {\n\tkind: 'TestObject',\n\tmetadata: {\n\t\tkey: '1'\n\t},\n\tspec: {\n\t\tvalue: 1\n\t}\n}\n\n/** Exchange(KafkaClient, topicName, groupId, sourceoptions as kafka js options) */\nconst ex1 = await Exchange(client, 'alyxstream-exchange-01', 'ae-01', {autoCommit: false})\nconst ex2 = await Exchange(client, 'alyxstream-exchange-02', 'ae-02', {autoCommit: false})\n\nawait ex1.emit(mex)\nawait ex1.on(async (messagge) =\u003e {\n\tconsole.log('sub', messagge)\n\tmessagge.spec.value += 1\n\tawait ex2.emit(messagge)\n})\n```\n\nIn order to override the default key parser and messagge validator:\n\n```js\nconst ex1 = await Exchange(client, 'alyxstream-exchange-01', 'ae-01', {autoCommit: false})\nex1.setKeyParser(x =\u003e x.metadata.myKey)\nex1.setValidationFunction(x =\u003e {\n\tif (x.spec.myValue == undefined) {\n\t\treturn false\n\t}\n\treturn true\n})\n```\n\n## Multiprocess/Parallel Mode \u003ca name=\"parallel\"\u003e\u003c/a\u003e\n\nYou can process a stream of data using multiple Node.js process:\n\n```js\nimport { Task } from '@dev.smartpricing/alyxstream'\n\nawait Task()\n.parallel(3)\n.dequeue(STORAGE)\n.close()\n```\n\n## NATS JetStream [ALPHA] \u003ca name=\"jetstream\"\u003e\u003c/a\u003e\n\nProducer:\n\n```js\nimport { Task, NatsClient } from '@dev.smartpricing/alyxstream';\n\nconst nc = await NatsClient()\n\nconst t = await Task()\n.toNats(nc, 'sp-test.a.a')\n\nfor (var i = 0; i \u003c 100; i += 1) {\n  await t.inject({key: i})\n}\n\n```\n\nConsumer:\n```js\nimport { Task, NatsClient, NatsJetstreamSource } from '@dev.smartpricing/alyxstream';\n\nconst nc = await NatsClient()\nconst source = await NatsJetstreamSource(nc, [{\n  stream: 'sp-test',\n  durable_name: 'worker-4',\n  ack_policy: 'Explicit',\n  filter_subjects: ['sp-test.a.a', 'sp-test.a.b']\n}])\n\nawait Task()\n.fromNats(source)\n.print('\u003e')\n.close()\n```\n\n## Distributed locks [ALPHA] \u003ca name=\"locks\"\u003e\u003c/a\u003e\n\nUsing Smartlocks libs, we can acquire global locks and have atomic counters.\n\nLock/Release:\n\n```js\nimport { Mutex, StorageKind as MSKind } from 'smartlocks'\nconst lockStorage = Mutex(MSKind.Cassandra, null)\n\nawait Task()\n.parallel(5)\n.fromArray([{i: 1}, {i: 2}, {i: 3}])\n.fn(x =\u003e {\n  x.i = x.i * 2\n  return x\n})\n.lock(lockStorage, x =\u003e 'my-lock')\n.fn(x =\u003e {\n  console.log(x)\n})\n.release(lockStorage, x =\u003e 'my-lock')\n.close()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmartpricing%2Falyxstream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmartpricing%2Falyxstream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmartpricing%2Falyxstream/lists"}