{"id":25862752,"url":"https://github.com/nsstc/data-drift","last_synced_at":"2026-05-14T02:34:58.856Z","repository":{"id":115707601,"uuid":"91693608","full_name":"NSSTC/data-drift","owner":"NSSTC","description":"Data streaming and scheduling library for Node.JS","archived":false,"fork":false,"pushed_at":"2019-02-22T13:24:41.000Z","size":35,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-11-29T18:58:48.070Z","etag":null,"topics":["data-pipeline","data-transformation","node-js","node-module","nodejs","nodejs-modules","stream","stream-processing","streaming","streams"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NSSTC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-05-18T13:00:33.000Z","updated_at":"2023-09-05T22:42:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"4a80d387-870e-41c0-82f7-1a179756eea0","html_url":"https://github.com/NSSTC/data-drift","commit_stats":null,"previous_names":["minecrawler/data-drift"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NSSTC/data-drift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NSSTC%2Fdata-drift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NSSTC%2Fdata-drift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NSSTC%2Fdata-drift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NSSTC%2Fdata-drift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NSSTC","download_url":"https://codeload.github.com/NSSTC/data-drift/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NSSTC%2Fdata-drift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33008184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipeline","data-transformation","node-js","node-module","nodejs","nodejs-modules","stream","stream-processing","streaming","streams"],"created_at":"2025-03-01T23:56:34.757Z","updated_at":"2026-05-14T02:34:58.850Z","avatar_url":"https://github.com/NSSTC.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# data-drift\n**Extensively configurable and stateful data-transformation-stream builder**\n\nHave you ever built a data pipeline, just to find out that the order of\nyour transformers might change during runtime? But how to do something\nlike that? Thankfully, data-drift comes to the rescue!\n\nData-drift is a highly configurable pipeline builder, which allows you to\nadd different segments and re-organize them. On top of that, everything is\nbuilt on top of Object Streams, which means you can add a state to the\nstreams. In short, you can have one pipeline and send different data\nthrough it, which you can easily distinguish thanks to the state.\n\nAs an additional bonus, data-drift makes use of monadic\n[Results](https://www.npmjs.com/package/result-js) and\n[Options](https://www.npmjs.com/package/roption-js),\nwhich results in superior error-management and better performance\n(in error cases), as nothing has to unwind with try..catch.\n\n**You can find the complete API, as defined in code, below the examples!**\n\n\n## Installation\n\nData-drift requires Node.JS v6+.\nFor fast install-times, I recommend using npm v5+.\n\n```sh\n$ npm i data-drift\n```\n\n\n## Simple Example\n\n```js\n'use strict';\n\nconst Stream = require('stream');\n\nconst DD = require('data-drift');\n\n\nconst pipeline = new DD();\n\n// create one source, one drain and n transformers.\n// everything has to be in Object Mode, so we cannot simply use stdin and stdout.\nconst source = new Stream.Readable({ objectMode: true, });\nconst drain = new Stream.Writable({ objectMode: true, });\nconst trans = new Stream.Transform({ objectMode: true, });\nconst trans2 = new Stream.Transform({ objectMode: true, });\n\n// the source has to emit an object you want to use in your pipeline\nsource._read = function() {\n    const input = process.stdin.read();\n    if (input !== null) {\n        this.push({\n            state: {},\n            data: input,\n        });\n    }\n};\n\nprocess.stdin.on('data', data =\u003e {\n    source.push({\n        state: {\n            timestamp: new Date(),\n        },\n        data: data.toString(),\n    });\n});\n\n// don't forget to always pass the initial object\ntrans._transform = function(data, _, cb) {\n    data.data = `You just inputted \"${data.data.replace(/\\r?\\n?$/, '')}\"!`;\n    cb(null, data);\n};\n\ntrans2._transform = function(data, _, cb) {\n    data.data = data.data.replace(/\\r?\\n?$/, '');\n    data.data += ' ~ ';\n    data.state.foo = 'FOO';\n    cb(null, data);\n};\n\n// the drain can consume the object in any way it wants,\n// for example write it to your HTTP server as response.\ndrain._write = function (data, _, cb) {\n    process.stdout.write(`Data: \"${data.data}\" State: ${JSON.stringify(data.state)}\\n`);\n    cb();\n};\n\ndrain.on('close', $d =\u003e { console.log('drain event ' + JSON.stringify($d)); });\n\n// when using data-drift, you have to register all pieces\n// you can register new workers any time you want\n// however, there can only be one source and one drain at a time!\npipeline.registerSegment(DD.SegmentType.SOURCE, source);\npipeline.registerSegment(DD.SegmentType.DRAIN, drain);\nconst transformer1 = pipeline.registerSegment(DD.SegmentType.WORKER, trans).unwrap();\nconst transformer2 = pipeline.registerSegment(DD.SegmentType.WORKER, trans2).unwrap();\n// add as many transformers as you like and hot-re-order them later on :)\n\n// then start the pipeline\npipeline.buildPipeline();\n\n// type something, wait 10s, type again to see the difference\nsetTimeout(() =\u003e {\n    console.log('Swap transformers...');\n\n    // the first position (after a source, if available) has the index 0\n    pipeline.setSegmentPosition(transformer2, 0);\n\n    // the next line is implicit, since all subsequent segments are pushed to the next position\n    //pipeline.setSegmentPosition(transformer1, 1);\n}, 10000);\n\n```\n\n\n## Usage\n\n### Create New Pipeline\n\n```js\n'use strict';\n\nconst Stream = require('stream');\nconst DD = require('data-drift');\n\n\nconst pipeline = new DD();\n\n// create one source, one drain and n transformers.\n// everything has to be in Object Mode, so we cannot simply use stdin and stdout.\nconst source = new Stream.Readable({ objectMode: true, });\nconst drain = new Stream.Writable({ objectMode: true, });\nconst trans = new Stream.Transform({ objectMode: true, });\nconst trans2 = new Stream.Transform({ objectMode: true, });\n\n// the source has to emit an object you want to use in your pipeline\nsource._read = function() {\n    const input = process.stdin.read();\n    if (input !== null) {\n        this.push({\n            state: {},\n            data: input,\n        });\n    }\n};\n\n// the source can be fed manually\nprocess.stdin.on('data', data =\u003e {\n    source.push({\n        state: {},\n        data: data.toString(),\n    });\n});\n\n// don't forget to always pass the initial object\ntrans._transform = function(data, _, cb) {\n    data.data = `You just inputted \"${data.data.replace(/\\r?\\n?$/, '')}\"!`;\n    cb(null, data);\n};\n\ntrans2._transform = function(data, _, cb) {\n    data.data += ' ~ ';\n    data.state.foo = 'FOO';\n    cb(null, data);\n};\n\n// the drain can consume the object in any way it wants,\n// for example write it to your HTTP server as response.\ndrain._write = function (data, _, cb) {\n    process.stdout.write(`Data: \"${data.data}\" State: ${JSON.stringify(data.state)}\\n`);\n    cb();\n};\n\n// ...\n\n```\n\n\n### Register Segments\n\n```js\n// ...\n\n// when using data-drift, you have to register all pieces\n// you can register new workers any time you want\n// however, there can only be one source and one drain at a time!\npipeline.registerSegment(DD.SegmentTypes.SOURCE, source);\npipeline.registerSegment(DD.SegmentTypes.DRAIN, drain);\nconst transformer1 = pipeline.registerSegment(DD.SegmentTypes.WORKER, trans).unwrap();\nconst transformer2 = pipeline.registerSegment(DD.SegmentTypes.WORKER, trans2).unwrap();\n// add as many transformers as you like and hot-re-order them later on :)\n\n//...\n\n```\n\n\n### Start Pipeline\n\n```js\n// ...\n\n// then start the pipeline\npipeline.buildPipeline();\n\n//...\n\n```\n\n\n### Re-Order Segment\n\n```js\n// ...\n\n// type something, wait 10s, type again to see the difference\nsetTimeout(() =\u003e {\n    console.log('Swap transformers...');\n\n    // the first position (after a source, if available) has the index 0\n    pipeline.setSegmentPosition(transformer2, 0);\n\n    // the next line is implicit, since all subsequent segments are pushed to the next position\n    //pipeline.setSegmentPosition(transformer1, 1);\n}, 10000);\n\n```\n\n\n## API\n\nThe interface below includes Exceptions,\nhowever all methods are fully implemented and will not throw.\nThe Exceptions are in place in order to provide you a clear, non-cluttered API overview.\n\n```js\nclass DataDrift {\n    static get SegmentType() {\n        return {\n            SOURCE: 0b001,\n            WORKER: 0b010,\n            DRAIN:  0b100,\n        };\n    };\n\n    constructor() { super({ objectMode: true, }); this._init(); };\n\n    /**\n     * Build pipeline and make it start working.\n     * You only have to call this method once to kick off the pipeline.\n     *\n     * @returns {boolean} true if the process was successful. If a source or drain is missing, false is returned.\n     */\n    buildPipeline() { throw new Error('Not implemented: DataDrift.buildPipeline'); };\n\n    /**\n     * Get position of a segment in the chain\n     *\n     * @param {object} segment\n     * @returns {Option\u003cnumber\u003e}\n     */\n    getSegmentPosition(segment) { throw new Error('Not implemented: DataDrift.getSegmentPosition'); };\n\n    /**\n     * Change the position of one of the worker segments\n     *\n     * @param {number} segment segment ID\n     * @param {number} position new position, whereas 0 is the first index (after a source, if available)\n     */\n    setSegmentPosition(segment, position) { throw new Error('Not implemented: DataDrift.setSegmentPosition'); };\n\n    /**\n     * Register a new segment.\n     * Can return the following errnos in an Err:\n     *   EPARAMETER: Either the type or the segment were not specified correctly.\n     *   ESOURCEALREADYREGISTERED: A source has already been registered before. There can only be one source.\n     *   EDRAINALREADYREGISTERED: A drain has already been registered before. There can only be one drain.\n     *   ESOURCEMUSTBEREADABLESTREAM: A source must be derived from a readable stream.\n     *   EDRAINMUSTBEWRITABLESTREAM: A drain must be derived from a writable stream.\n     *   ECREATEITEM\n     *\n     * @param {SegmentType} type\n     * @param {function} segment\n     * @returns {Result\u003cnumber,VError\u003e} segment ID as used by the Drift\n     */\n    registerSegment(type, segment) { throw new Error('Not implemented: DataDrift.registerSegment'); };\n\n    /**\n     * Unregister a segment from the pipeline\n     *\n     * @param {number} segment segment ID\n     */\n    unregisterSegment(segment) { throw new Error('Not implemented: DataDrift.unregisterSegment'); };\n\n    /**\n     * Check if the pipeline has a certain segment type\n     *\n     * @param {SegmentType} type\n     * @returns {boolean}\n     */\n    hasSegmentType(type) { throw new Error('Not implemented: DataDrift.hasSegmentType'); };\n};\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnsstc%2Fdata-drift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnsstc%2Fdata-drift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnsstc%2Fdata-drift/lists"}