{"id":22725639,"url":"https://github.com/livetocode/tabular-data-differ","last_synced_at":"2026-01-02T20:28:55.469Z","repository":{"id":65598052,"uuid":"584441621","full_name":"livetocode/tabular-data-differ","owner":"livetocode","description":"A very efficient library for diffing two sorted streams of tabular data, such as CSV files.","archived":false,"fork":false,"pushed_at":"2024-09-22T13:10:27.000Z","size":185,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-30T04:55:28.554Z","etag":null,"topics":["changes","comparison","csv","delta","diff","difference","table","tabular-data","tsv"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livetocode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-01-02T15:22:22.000Z","updated_at":"2024-09-22T13:10:17.000Z","dependencies_parsed_at":"2023-11-29T00:09:06.931Z","dependency_job_id":"a5bfdf5f-37f1-4863-a76c-4df2737579de","html_url":"https://github.com/livetocode/tabular-data-differ","commit_stats":{"total_commits":39,"total_committers":1,"mean_commits":39.0,"dds":0.0,"last_synced_commit":"e34b60e21aaf2469791f083ac691fb890ba4fd5c"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Ftabular-data-differ","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Ftabular-data-differ/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Ftabular-data-differ/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Ftabular-data-differ/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livetocode","download_url":"https://codeload.github.com/livetocode/tabular-data-differ/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229089187,"owners_count":18018391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["changes","comparison","csv","delta","diff","difference","table","tabular-data","tsv"],"created_at":"2024-12-10T16:13:26.359Z","updated_at":"2026-01-02T20:28:55.408Z","avatar_url":"https://github.com/livetocode.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Summary\n\nA very efficient library for diffing two **sorted** streams of tabular data, such as CSV files.\n\n### Keywords\n- table\n- tabular data\n- CSV\n- TSV\n- diff\n- difference\n- delta\n- changes\n- comparison\n\n### Table of content\n\n- [**Why another lib?**](#why-another-lib)\n- [**Features**](#features)\n- [**Points of interest**](#points-of-interest)\n- [**Algorithm complexity**](#algorithm-complexity)\n- [**Usage**](#usage)\n- [**Documentation**](#documentation)\n- [**Development**](#development)\n- [**Roadmap**](#roadmap)\n\n# Why another lib?\n\nMost of the diffing libraries either load all the data in memory for comparison or would at least load the keys and store some hash on the data.\nThis is fine for a lot of scenarios but it doesn't scale with huge files and puts a risk that the data would'nt fit in memory.\nAlso, those strategies require a two-pass approach for diffing which is more expensive.\n\nThis library requires that the submitted files are already sorted by some primary key to compare the two streams in a single pass, \nwhile loading at most two rows of data in memory.\n\nIf your data is not already sorted, you can use my other lib https://github.com/livetocode/huge-csv-sorter, which can sort a huge file very efficiently thanks to SQLite.\n\nThis allows us to diff two 600MB files containing 2.6 millions of rows and 37 columns in 18 seconds on my MacBook Pro.\nOr two 250 MB files containing 4 millions of rows and 7 columns in 10 seconds.\n\n# Features\n\n- very fast\n- memory efficient\n- multiple input formats\n- multiple output formats\n- input files can have different column sets, in different order\n- input files can have different formats\n- compact JSON output format (field names are not repeated)\n- highly configurable and customizable\n- change stats\n- new and old values available\n\n# Points of interest\n\n- single pass algorithm offering O(n) performance\n- 100% code coverage\n- no external dependency\n- small composable objects\n- async streams\n- async iterator for enumerating the changes\n- generic input format using an async generator function\n\n# Algorithm complexity\n\nAssuming that n is the number of rows in the old source and m the number of rows in the new source:\n- Min complexity is O(max(n, m))\n- Max complexity is O(n+m) (when old source contains only deleted rows and new source only new rows)\n\nThe average complexity, assuming a low rate of additions or deletions, should be linear and based on the input files.\n\n# Usage\n\n## Install the library\n\n`npm i tabular-data-differ`\n\n## Examples \n\n### Diff 2 CSV files on the console\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to('console');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files on the console when the key column is in descending order\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: [{\n        name: 'id',\n        sortDirection: 'DESC',\n    }],\n}).to('console');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files on the console with a multi-column primary key, including a number\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: [\n        'code',\n        {\n            name: 'version',\n            comparer: 'number',\n        }\n    ],\n}).to('console');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files on the console with a single case insensitive primary key (using a custom comparer)\n\n```Typescript\nimport { diff, CellValue, cellComparer, stringComparer } from 'tabular-data-differ';\n\nfunction caseInsensitiveCompare((a: CellValue, b: CellValue): number {\n    if (typeof a === 'string' \u0026\u0026 typeof b === 'string') {\n        return stringComparer(a.toLowerCase(), b.toLowerCase());\n    }\n    return cellComparer(a, b);\n}\n\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: [\n        {\n            name: 'id',\n            comparer: caseInsensitiveCompare,\n        }\n    ],\n}).to('console');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files and only get the stats\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to('null');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files and produce a CSV file\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to('./temp/delta.csv');\nconsole.log(stats);\n```\n\n### Diff 2 CSV files and produce a CSV file using HTTP transport\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: new URL('https://some.server.org/tests/a.csv'),\n    newSource: new URL('https://some.server.org/tests/b.csv'),\n    keys: ['id'],\n}).to(new URL('https://some.server.org/temp/delta.csv'));\nconsole.log(stats);\n```\n\nNote that you can provide the username/password in the URL object if you need basic authentication.\n\n### Diff 2 CSV files and produce a JSON file\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to({\n    destination: {\n        format: 'json',\n        stream: './temp/delta.json',\n    },\n});\nconsole.log(stats);\n```\n\n### Diff 2 CSV files and produce a TSV file\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to({\n    destination: {\n        format: 'tsv',\n        stream: './temp/delta.tsv',\n    }\n});\nconsole.log(stats);\n```\n\n### Diff one CSV and one TSV and produce a JSON file\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: {\n        format: 'tsv',\n        stream: './tests/b.tsv',\n    },\n    keys: ['id'],\n}).to({\n    destination: {\n        format: 'json',\n        stream: './temp/delta.json',\n    },\n});\nconsole.log(stats);\n```\n\n### Diff two string arrays and enumerate the changes\n\n```Typescript\nimport { diff, ArrayInputStream } from 'tabular-data-differ';\nconst ctx = await diff({\n    oldSource: {\n        format: 'csv',\n        stream: new ArrayInputStream([\n            'id,name',\n            '1,john',\n            '2,mary',\n        ]),\n    },\n    newSource: {\n        format: 'csv',\n        stream: new ArrayInputStream([\n            'id,name',\n            '1,johnny',\n            '3,sarah',\n        ]),\n    },\n    keys: ['id'],\n}).start();\nconsole.log('columns:', ctx.columns);\nconst idIdx = ctx.columns.indexOf('id);\nassert(idIdx \u003e= 0, 'could not find id column');\nconst nameIdx = ctx.columns.indexOf('name);\nassert(nameIdx \u003e= 0, 'could not find name column');\nfor await (const rowDiff of ctx.diffs()) {\n    if (rowDiff.status === 'modified') {\n        const id = rowDiff.newRow[idIdx];\n        const oldName = rowDiff.oldRow[nameIdx];\n        const newName = rowDiff.newRow[nameIdx];\n        if (oldName !== newName) {\n            console.log('In record ', id, ', name changed from', oldName, 'to', newName);\n        }\n    }\n}\nconsole.log('stats:', ctx.stats);\n```\n\n### Diff 2 CSV files on the console and ignore deleted rows\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a.csv',\n    newSource: './tests/b.csv',\n    keys: ['id'],\n}).to({\n    destination: 'console',\n    filter: (rowDiff) =\u003e rowDiff.status !== 'deleted',\n});\nconsole.log(stats);\n```\n\n### Diff 2 CSV files on the console but select only some categories of rows\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst ctx = await diff({\n    oldSource: './tests/c.csv',\n    newSource: './tests/d.csv',\n    keys: [\n        'code',\n        {\n            name: 'version',\n            comparer: 'number',\n        }\n    ],\n}).start();\nconst catIdx = ctx.columns.indexOf('CATEGORY');\nassert(catIdx \u003e= 0, 'could not find CATEGORY column');\nconst stats = await ctx.to({\n    destination: 'console',\n    filter: (rowDiff) =\u003e ['Fruit', 'Meat'].includes(rowDiff.newRow?.[catIdx]?.toString() ?? rowDiff.oldRow?.[catIdx]?.toString() ?? ''),\n});\nconsole.log(stats);\n```\n\n### Duplicate key handling\n\nIf your data sources contain duplicate keys, then the diffing will fail by default, but you can configure this behavior using the duplicateKeyHandling option.\n\nYou can resolve the conflict by keeping the first or last row of the duplicates:\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a2.csv',\n    newSource: './tests/b2.csv',\n    keys: ['id'],\n    duplicateKeyHandling: 'keepFirstRow', // or 'keepLastRow'\n}).to('console');\nconsole.log(stats);\n```\n\nOr, if you need more control in the row selection, then you can provide your own handler:\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst stats = await diff({\n    oldSource: './tests/a2.csv',\n    newSource: './tests/b2.csv',\n    keys: ['id'],\n    duplicateKeyHandling: (rows) =\u003e rows[0], // same as 'keepFirstRow'\n    duplicateRowBufferSize: 2000,\n}).to('null');\nconsole.log(stats);\n```\n\nNote that you can specify the size of the buffer if you know that it cannot exceed this quantity, otherwise you can enable the **duplicateRowBufferOverflow** option,\nwhich will remove the first entries when it exceeds the allocated capacity, to avoid any failure.\n\nFinally, you can inspect the source stats to check the duplication metrics:\n```Typescript\nimport { diff } from 'tabular-data-differ';\nconst ctx = await diff({\n    oldSource: './tests/a2.csv',\n    newSource: './tests/b2.csv',\n    keys: ['id'],\n    duplicateKeyHandling: 'keepFirstRow', // or 'keepLastRow'\n}).start();\nconst stats = await ctx.to('null');\nconsole.log(stats);\nconsole.log(ctx.oldStats);\nconsole.log(ctx.newStats);\n\n```\n\n\n### Order 2 CSV files and diff them on the console\n\nDon't forget to install first my other lib: `npm i huge-csv-sorter`.\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: './tests/a.csv',\n    destination: './tests/a.sorted.csv',\n    orderBy: ['id'],\n});\n\nawait sort({\n    source: './tests/b.csv',\n    destination: './tests/b.sorted.csv',\n    orderBy: ['id'],\n});\n\nconst stats = await diff({\n    oldSource: './tests/a.sorted.csv',\n    newSource: './tests/b.sorted.csv',\n    keys: ['id'],\n}).to('console');\nconsole.log(stats);\n```\n\n### Auto-correct unordered CSV files and retry diff\n\nDon't forget to install first my other lib: `npm i huge-csv-sorter`.\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nimport { sort } from 'huge-csv-sorter';\n\ntry {\n    // try diff\n    const stats = await diff({\n        oldSource: './tests/a.csv',\n        newSource: './tests/b.csv',\n        keys: ['id'],\n    }).to('./tests/diff.csv');\n    console.log(stats);\n} catch(err) {\n    // catch unordered exception\n    if (err instanceof UnorderedStreamsError) {\n        // sort files\n        await sort({\n            source: './tests/a.csv',\n            destination: './tests/a.sorted.csv',\n            orderBy: ['id'],\n        });\n\n        await sort({\n            source: './tests/b.csv',\n            destination: './tests/b.sorted.csv',\n            orderBy: ['id'],\n        });\n        // retry diff\n        const stats = await diff({\n            oldSource: './tests/a.sorted.csv',\n            newSource: './tests/b.sorted.csv',\n            keys: ['id'],\n        }).to('./tests/diff.csv');\n        console.log(stats);\n    } else {\n        throw err;\n    }\n} finally {\n    // delete sorted files...\n}\n```\n\n### async iterable source\n\nYou can easily plug any kind of data source by leveraging nodejs async generator functions, \nwhich would allow you to fetch the data from a database or from a REST API endpoint!\n\nHere's a simplistic example:\n\n```typescript\nimport { diff } from 'tabular-data-differ';\n\nconst stats = await diff({\n    oldSource: {\n        format: 'iterable',\n        provider: someAsyncSource,\n    },\n    newSource: {\n        format: 'iterable',\n        provider: () =\u003e someAsyncSource(2),\n    },\n    keys: ['id'],\n}).to('./output/files/output.csv');\nconsole.log(stats);\n\n\nasync function *someAsyncSource(limit?: number) {\n    let items = [\n        {\n            id: 1,\n            name: 'John',\n            age: 33,\n        },\n        {\n            id: 2,\n            name: 'Mary',\n            age: 22,\n        },\n        {\n            id: 3,\n            name: 'Cindy',\n            age: 44,\n        },\n    ];  \n    if (limit !== undefined){\n        items = items.slice(0, limit);\n    }\n    for (const item of items) {\n        yield item;\n    }\n}\n\n```\n\n\n# Documentation\n\n- [**API**](#api)\n- [**File formats**](#file-formats)\n\n## API\n\n### Source options\n\n#### CSV\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'csv' to select the CSV format\nstream   | yes    |             | either a string filename, a URL or an instance of an InputStream (like FileInputStream).\ndelimiter| no     | ,           | the char used to delimit fields within a row.\n\n#### TSV\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'tsv' to select the TSV format\nstream   | yes    |             | either a string filename, a URL or an instance of an InputStream (like FileInputStream).\ndelimiter| no     | \\t          | the char used to delimit fields within a row.\n\n#### JSON\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'json' to select the JSON format\nstream   | yes    |             | either a string filename, a URL or an instance of an InputStream (like FileInputStream).\n\n#### Iterable\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'iterable' to select the Iterable format\nprovider | yes    |             | a function that must return an instance of an async iterable object (see [Async generator functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncGenerator))\n\nSee [Example](#async-iterable-source)\n\n#### Custom\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'custom' to select the custom format\nreader   | yes    |             | an instance of a FormatReader\n\n### Destination options\n\n#### CSV\n\nName         |Required|Default value|Description\n-------------|--------|-------------|-----------\nformat       | yes    |             | You must specify 'csv' to select the CSV format\nstream       | yes    |             | either a string filename, a URL or an instance of an OutputStream (like FileOutputStream).\ndelimiter    | no     | ,           | the char used to delimit fields within a row.\nkeepOldValues| no     | false       | specifies if the destination should contain both the old and new values for each row.\n\n#### TSV\n\nName         |Required|Default value|Description\n-------------|--------|-------------|-----------\nformat       | yes    |             | You must specify 'tsv' to select the TSV format\nstream       | yes    |             | either a string filename, a URL or an instance of an OutputStream (like FileOutputStream).\ndelimiter    | no     | \\t          | the char used to delimit fields within a row.\nkeepOldValues| no     | false       | specifies if the destination should contain both the old and new values for each row.\n\n#### JSON\n\nName         |Required|Default value|Description\n-------------|--------|-------------|-----------\nformat       | yes    |             | You must specify 'json' to select the JSON format\nstream       | yes    |             | either a string filename, a URL or an instance of an OutputStream (like FileOutputStream).\nkeepOldValues| no     | false       | specifies if the destination should contain both the old and new values for each row.\n\n#### Custom\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nformat   | yes    |             | You must specify 'custom' to select the custom format\nwriter   | yes    |             | an instance of a FormatWriter\n\n### OutputOptions\n\nName            |Required|Default value|Description\n----------------|--------|-------------|-----------\ndestination     | yes    |             | either a standard output (console, null), a string filename, a URL or an instance of an InputStream (like FileInputStream). \nfilter          | no     |             | a filter to select which changes should be sent to the output stream.\nkeepSameRows    | no     | false       | specifies if the output should also contain the rows that haven't changed.\nchangeLimit     | no     |             | specifies a maximum number of differences that should be outputted.\nlabels          | no     |             | a dictionary of key/value pairs that can provide custom metadata to the generated file.\n\n### Key options (ColumnDefinition)\n\nName         |Required|Default value|Description\n-------------|--------|-------------|-----------\nname         | yes    |             | the name of the column.\ncomparer     | no     | string      | either a standard comparer ('string' or 'number') or a custom comparer.\nsortDirection| no     | ASC         | specifies if the column is sorted in ascending (ASC) or descending (DESC) order.\n\n### Differ options\n\nName                      |Required|Default value|Description\n--------------------------|--------|-------------|-----------\noldSource                 | yes    |             | either a string filename, a URL or a SourceOptions\nnewSource                 | yes    |             | either a string filename, a URL or a SourceOptions\nkeys                      | yes    |             | the list of columns that form the primary key. This is required for comparing the rows. A key can be a string name or a {ColumnDefinition}\nincludedColumns           | no     |             | the list of columns to keep from the input sources. If not specified, all columns are selected.\nexcludedColumns           | no     |             | the list of columns to exclude from the input sources.\nrowComparer               | no     |             | specifies a custom row comparer.\nduplicateKeyHandling      |no      | fail        | specifies how to handle duplicate rows in a source. It will fail by default and throw a UniqueKeyViolationError exception. But you can ignore, keep the first or last row, or even provide your own function that will receive the duplicates and select the best candidate. \nduplicateRowBufferSize    |no      | 1000        | specifies the maximum size of the buffer used to accumulate duplicate rows.\nduplicateRowBufferOverflow|no      | false       | specifies if we can remove the first entries of the buffer to continue adding new duplicate entries when reaching maximum capacity, to avoir throwing an error and halting the process.\n\n### diff function\n\nCreates a Differ object from the submitted DifferOptions.\n\n### Differ methods\n\n#### start\n\nreturns a new DifferContext object with the input streams open and columns initialized.\n\nYou must call start to get an iterator (DifferContext.diffs) or if you need the columns prior to sending the diffs to the output with the \"to\" method.\n\n#### to\n\nInitiates the comparison between the old and new sources and sends the diffs to the specified output.\n\nThis returns the change stats once completed.\n\nThe options parameter can be either a standard output (console, null), a string filename, a URL or an OutputOptions.\n\nNote that it can throw the UnorderedStreamsError exception if it detects that the streams are not properly ordered by the specified keys.\nNote that it can throw the UniqueKeyViolationError exception if it detects that a stream has duplicate keys which violates the primary keys specified in the options.\n\n### DifferContext methods\n\n#### close\n\nCloses all open streams.\n\nNote that the methods \"to\" or \"diffs\" will automatically close the streams.\n\n#### columns\n\nReturns the current column names.\n\n#### stats\n\nReturns the currents stats.\n\n#### oldSourceStats\n\nReturns the stats accumulated while parsing the old source.\n\n#### newSourceStats\n\nReturns the stats accumulated while parsing the new source.\n\n#### to\n\nInitiates the comparison between the old and new sources and sends the diffs to the specified output.\n\nThis returns the change stats once completed.\n\nThe options parameter can be either a standard output (console, null), a string filename, a URL or an OutputOptions.\n\nNote that it can throw the UnorderedStreamsError exception if it detects that the streams are not properly ordered by the specified keys.\nNote that it can throw the UniqueKeyViolationError exception if it detects that a stream has duplicate keys which violates the primary keys specified in the options.\n\n#### diffs\n\nEnumerates the differences between the old and new sources.\n\nNote that it can throw the UnorderedStreamsError exception if it detects that the streams are not properly ordered by the specified keys.\nNote that it can throw the UniqueKeyViolationError exception if it detects that a stream has duplicate keys which violates the primary keys specified in the options.\n\n### JSON input format\n\nThis library implements a simplistic JSON parser with a couple of assumptions:\n- each JSON object should be saved on a distinct line\n- the JSON file should only contain an array of objects\n- each object should be flat (no nested JSON objects)\n- all objects should share the same properties\n- the lines can be indented\n- each object can have either a preceding or a trailing comma\n- the array start ([) and end (]) can be inlined with the first/last object or their own separate line\n\n#### Examples\n\n```json\n[\n    {\"id\": \"01\",\"a\":\"a1\",\"b\":\"b1\",\"c\":\"c1\"},\n    {\"id\": \"02\",\"a\":\"a2\",\"b\":\"b2\",\"c\":\"c2\"},\n    {\"id\": \"03\",\"a\":\"a3\",\"b\":\"b3\",\"c\":\"c3\"}\n]\n```\n\n```json\n[{\"id\": \"01\",\"a\":\"a1\",\"b\":\"b1\",\"c\":\"c1\"},\n{\"id\": \"02\",\"a\":\"a2\",\"b\":\"b2\",\"c\":\"c2\"},\n{\"id\": \"03\",\"a\":\"a3\",\"b\":\"b3\",\"c\":\"c3\"}]\n```\n\n```json\n[\n    {\"id\": \"01\",\"a\":\"a1\",\"b\":\"b1\",\"c\":\"c1\"}\n    ,{\"id\": \"02\",\"a\":\"a2\",\"b\":\"b2\",\"c\":\"c2\"}\n    ,{\"id\": \"03\",\"a\":\"a3\",\"b\":\"b3\",\"c\":\"c3\"}\n]\n```\n\n## File formats\n\n### CSV output format\n\nThis is a standard CSV format, using the specified character for delimiting fields or the default one (comma).\n\nNote that there is an additional column named DIFF_STATUS that will tell if the row was added, deleted, modified.\n\n```csv\nDIFF_STATUS,id,a,b,c\ndeleted,01,a1,b1,c1\nmodified,04,aa4,bb4,cc4\ndeleted,05,a5,b5,c5\ndeleted,06,a6,b6,c6\nadded,10,a10,b10,c10\nadded,11,a11,b11,c11\n```\n\nNote that if you set the \"OutputOptions.keepOldValues\" property to true, you'll get additional columns prefixed by 'OLD_':\n```csv\nDIFF_STATUS,id,a,b,c,OLD_id,OLD_a,OLD_b,OLD_c\ndeleted,,,,,01,a1,b1,c1\nmodified,04,aa4,bb4,cc4,04,a4,b4,c4\ndeleted,,,,,05,a5,b5,c5\ndeleted,,,,,06,a6,b6,c6\nadded,10,a10,b10,c10,,,,\nadded,11,a11,b11,c11,,,,\n```\n\n### JSON output format\n\nThe schema is made of 3 parts:\n- the header\n- the items\n- the footer\n\n```json\n{\n    \"header\": {},\n    \"items\": [...],\n    \"footer\": {}\n}\n```\n\n#### Header\n\nThe header contains a mandatory list of columns and an optional dictionary of key/value pairs named labels.\n\n```json\n{\n    \"columns\": [\"col1\", \"col2\", \"col3\"]\n}\n```\n\nor\n\n```json\n{\n    \"columns\": [\"col1\", \"col2\", \"col3\"],\n    \"labels\": {\n        \"key1\": \"val1\",\n        \"key2\": \"val2\",\n    }\n}\n```\n\n#### Items\n\nA list of RowDiff objects, which can have two distinct layouts based on the \"OutputOptions.keepOldValues\" property.\n\n##### keepOldValues is false or undefined\n```json\n{\"status\":\"deleted\",\"data\":[\"01\",\"a1\",\"b1\",\"c1\"]},\n{\"status\":\"same\",\"data\":[\"02\",\"a2\",\"b2\",\"c2\"]},\n{\"status\":\"modified\",\"data\":[\"04\",\"aa4\",\"bb4\",\"cc4\"]},\n{\"status\":\"added\",\"data\":[\"10\",\"a10\",\"b10\",\"c10\"]},\n```\n##### keepOldValues is true\n\n```json\n{\"status\":\"deleted\",\"old\":[\"01\",\"a1\",\"b1\",\"c1\"]},\n{\"status\":\"same\",\"new\":[\"02\",\"a2\",\"b2\",\"c2\"],\"old\":[\"02\",\"a2\",\"b2\",\"c2\"]},\n{\"status\":\"modified\",\"new\":[\"04\",\"aa4\",\"bb4\",\"cc4\"],\"old\":[\"04\",\"a4\",\"b4\",\"c4\"]},\n{\"status\":\"added\",\"new\":[\"10\",\"a10\",\"b10\",\"c10\"]},\n```\n\n#### Footer\n\nThe footer will simply contain a stats section summarizing the types of changes in the file.\n\n```json\n{\n    \"stats\" : {\n        \"totalComparisons\": 11,\n        \"totalChanges\": 6,\n        \"changePercent\": 54.55,\n        \"added\": 2,\n        \"deleted\": 3,\n        \"modified\": 1,\n        \"same\": 5\n    }\n}\n```\n\n# Development\n\n## Install\n\n```shell\ngit clone git@github.com:livetocode/tabular-data-differ.git\ncd tabular-data-differ\nnpm i\n```\n\n## Tests\n\nTests are implemented with Jest and can be run with:\n`npm t`\n\nYou can also look at the coverage with:\n`npm run show-coverage`\n\n# Roadmap\n\nIf you manifest some interest in this project, we could add new streams:\n- S3, allowing you to use an external storage capacity such as AWS S3\n- HTTP, allowing you to provide custom headers for authentication\n\nAnd we could add more formats:\n- XML\n- protobuff\n- SQL, allowing you to diff two database tables between two separate databases\n\nBut with the [async iterable sources](#iterable) feature, you should be able to easily plug any kind of data source you need!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Ftabular-data-differ","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivetocode%2Ftabular-data-differ","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Ftabular-data-differ/lists"}