{"id":20973497,"url":"https://github.com/livetocode/huge-csv-sorter","last_synced_at":"2026-01-16T01:02:25.461Z","repository":{"id":65696176,"uuid":"585761029","full_name":"livetocode/huge-csv-sorter","owner":"livetocode","description":"This library can sort huge CSV files efficiently","archived":false,"fork":false,"pushed_at":"2026-01-02T18:45:41.000Z","size":285,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-09T06:46:05.798Z","etag":null,"topics":["big","csv","fast","huge","large","order","sort","sqlite"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livetocode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-01-06T02:10:25.000Z","updated_at":"2026-01-02T18:45:41.000Z","dependencies_parsed_at":"2023-02-18T19:45:39.857Z","dependency_job_id":null,"html_url":"https://github.com/livetocode/huge-csv-sorter","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/livetocode/huge-csv-sorter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fhuge-csv-sorter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fhuge-csv-sorter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fhuge-csv-sorter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fhuge-csv-sorter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livetocode","download_url":"https://codeload.github.com/livetocode/huge-csv-sorter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fhuge-csv-sorter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28475142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T00:15:39.755Z","status":"ssl_error","status_checked_at":"2026-01-16T00:15:32.174Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big","csv","fast","huge","large","order","sort","sqlite"],"created_at":"2024-11-19T04:19:54.845Z","updated_at":"2026-01-16T01:02:25.447Z","avatar_url":"https://github.com/livetocode.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Summary\n\nThis library can sort huge CSV files efficiently.\n\nOnce your CSV files are properly sorted on a primary key, they can also be efficiently compared to produce a diff file,\nusing my other lib https://github.com/livetocode/tabular-data-differ\n\n### Keywords\n- csv\n- huge\n- large\n- big\n- sort\n- order\n- fast\n- sqlite\n\n### Table of content\n\n- [**Why another lib?**](#why-another-lib)\n- [**Features**](#features)\n- [**Prerequisites**](#prerequisites)\n- [**Usage**](#usage)\n- [**Documentation**](#documentation)\n- [**Development**](#development)\n\n# Why another lib?\n\nMost CSV sorting libraries would read the file in memory for sorting and filtering it, which is not possible when the files are huge!\n\nThis library acts as a thin wrapper around the SQLite library and delegates all the work to the DB which is made for this exact scenario.\n\n# Features\n\n- consumes very few memory\n- can sort huge files that wouldn't fit in memory\n- very fast since it relies on SQLite which is a highly optimized C library\n\n# Prerequisites\n\nThe \"sqlite3\" command must be installed on your system.\n\nFor a Mac:\n`brew install sqlite`\n\nDon't forget to install the proper package if you're running your app in a container.\nFor example, using the Node Alpine distro:\n`RUN apk add sqlite`\n\nNote that we couldn't use the sqlite npm package since it wouldn't let us execute meta commands such as \".import\" which we rely on for importing the CSV.\n(see https://sqlite.org/cli.html#csv_import)\n\n# Usage\n\n## Install\n\n`npm i huge-csv-sorter`\n\n## Examples\n\n### Sort a file with one primary column\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nsort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['id'],\n});\n```\n\n### Sort a file with two primary columns\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['code', 'version'],\n});\n```\n\n### Sort a file with two primary columns, with the one pk in descending order\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: [\n        'code', \n        {\n            name: 'version',\n            sortDirection: 'DESC',\n        }\n    ],\n});\n```\n\n### Sort a file with a subset of the original columns\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    select: [\n        'id',\n        'name',\n        'price'\n    ],\n    orderBy: ['id'],\n});\n```\n\n### Sort a file with typed columns and order by a number column\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    schema: [\n        { \n            name: 'id',\n            type: 'number',            \n        },\n        'name',\n        {\n            name: 'price',\n            type: 'number',\n        }\n    ],\n    select: ['id', 'name', 'price'],\n    orderBy: ['id'],\n});\n```\n\n### Sort a file with a custom delimiter such as tab for TSV files\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: {\n        filename: 'huge.tsv',\n        delimiter: '\\t',\n    },\n    destination: {\n        filename: 'huge.sorted.tsv',\n        delimiter: '\\t',\n    },\n    orderBy: ['id'],\n});\n```\n\n### Sort a file and filter the output rows on a text column\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['id'],\n    where: `CATEGORY in ('Cat1', 'Cat2', 'Cat3')`,\n});\n```\n\n### Sort a file and filter the output rows on a number column\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    schema: [\n        { \n            name: 'id',\n            type: 'number',\n        },\n        'name',\n    ],\n    orderBy: ['id'],\n    where: `id \u003c 1000`,\n});\n```\n\n### Sort a file and filter the output rows on a column that must be quoted\n\nBe careful if the name of the columns you're filtering on contain special chars: in this case, you must double-quote them or SQLite will fail to identify the columns.\n\nNote that the where clause should be pure valid SQL and no validation/conversion is done by this library.\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['The ID'],\n    where: `\"The ID\" \u003c 1000`,\n});\n```\n\n### Sort a file and paginate\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['id'],\n    offset: 1000,\n    limit: 100,\n});\n```\n\n### Sort a file with custom sqlite settings\n\nIf you want to keep the SQLite database for further inspection after the import, you override the sqlite options.\nYou can also change the filename of the SQLite database which will use the destination filename and replace the csv extension with sqlite.\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    sqlite: {\n        filename: '/tmp/huge.sqlite',\n        keepDB: true, // do not delete db after sort\n    },\n    orderBy: ['id'],\n});\n```\n\n### Log all commands\n\nIf you want to understand how the schema, the import and the query are implemented in SQLite, you can provide your logger function:\n\n```Typescript\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: 'huge.csv',\n    destination: 'huge.sorted.csv',\n    orderBy: ['id'],\n    logger: console.log,\n});\n```\n\n### Order 2 CSV files and diff them on the console\n\nNote that you must also install the diff lib with `npm i tabular-data-differ`.\n\n```Typescript\nimport { diff } from 'tabular-data-differ';\nimport { sort } from 'huge-csv-sorter';\n\nawait sort({\n    source: './tests/a.csv',\n    destination: './tests/a.sorted.csv',\n    orderBy: ['id'],\n});\n\nawait sort({\n    source: './tests/b.csv',\n    destination: './tests/b.sorted.csv',\n    orderBy: ['id'],\n});\n\nconst stats = await diff({\n    oldSource: './tests/a.sorted.csv',\n    newSource: './tests/b.sorted.csv',\n    keys: ['id'],\n}).to('console');\nconsole.log(stats);\n```\n\n# Documentation\n\n## FileOptions\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nfilename | yes    |             | a filename\ndelimiter| no     | ,           | the optional delimiter of the columns\n\n## SchemaColumn\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nname     | yes    |             | the name of the column.\ntype     | no     | string      | the type of the column: either a string or a number.\n\n## SortedColumn\n\nName         |Required|Default value|Description\n-------------|--------|-------------|-----------\nname         | yes    |             | the name of the column.\nsortDirection| no     | ASC         | the sort direction of the data.\n\n## SQLiteOptions\n\nName     |Required|Default value|Description\n---------|--------|-------------|-----------\nfilename | yes    |             | the filename of the SQLite temporary database.\nkeepDB   | no     | false       | specifies whether to keep the database after the operation or if it should be deleted.\ncli      | no     | sqlite3     | the SQLite command line tool.\n\n## SortOptions\n\nName        |Required|Default value|Description\n------------|--------|-------------|-----------\nsource      | yes    |             | either a filename or a FileOptions object\ndestination | yes    |             | either a filename or a FileOptions object\nschema      | no     |             | an optional list of columns annotated with their type (string or number). Note that if is specified, it **must** match all columns of the source file, in the same order of appearance, otherwise the SQLite import will be aborted. \nselect      | no     |             | a selection of columns to keep from the source CSV. It will keep all columns when not specified.\norderBy     | yes    |             | a list of columns for ordering the records.\nwhere       | no     |             | the conditions for filtering the records.\noffset      | no     | 0           | the offset from which to start selecting the records\nlimit       | no     |             | the maximum number of records to select. It will keep all records when not specified.\nsqlite      | no     |             | options for customizing SQLite.\nlogger      | no     |             | a function for logging commands sent to SQLite\n\n## sort\n\nThe sort function will require a single parameter of type {SortOptions}.\n\nThere are only 3 required options:\n- source\n- destination\n- orderBy\n\n# Development\n\n## Install\n\n```shell\ngit clone git@github.com:livetocode/huge-csv-sorter.git\ncd huge-csv-sorter\nnpm i\n```\n\n## Tests\n\nTests are implemented with Jest and can be run with:\n`npm t`\n\nYou can also look at the coverage with:\n`npm run show-coverage`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Fhuge-csv-sorter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivetocode%2Fhuge-csv-sorter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Fhuge-csv-sorter/lists"}