{"id":16572810,"url":"https://github.com/lemire/externalsortinginjava","last_synced_at":"2026-02-03T18:16:12.667Z","repository":{"id":15334740,"uuid":"18065193","full_name":"lemire/externalsortinginjava","owner":"lemire","description":"External-Memory Sorting in Java","archived":false,"fork":false,"pushed_at":"2023-04-04T02:03:43.000Z","size":142,"stargazers_count":254,"open_issues_count":4,"forks_count":102,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-10-12T21:28:36.826Z","etag":null,"topics":["external-memory","java","sort"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lemire.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2014-03-24T14:25:37.000Z","updated_at":"2024-10-12T04:00:50.000Z","dependencies_parsed_at":"2024-01-05T20:45:51.116Z","dependency_job_id":"eb15d93f-4757-446d-a337-f4d39c19ad0c","html_url":"https://github.com/lemire/externalsortinginjava","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemire%2Fexternalsortinginjava","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemire%2Fexternalsortinginjava/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemire%2Fexternalsortinginjava/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemire%2Fexternalsortinginjava/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lemire","download_url":"https://codeload.github.com/lemire/externalsortinginjava/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801175,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["external-memory","java","sort"],"created_at":"2024-10-11T21:28:37.760Z","updated_at":"2026-02-03T18:16:12.631Z","avatar_url":"https://github.com/lemire.png","language":"Java","readme":"Externalsortinginjava\n==========================================================\n[![][maven img]][maven]\n[![][license img]][license]\n[![docs-badge][]][docs]\n[![Java CI](https://github.com/lemire/externalsortinginjava/actions/workflows/java8.yml/badge.svg)](https://github.com/lemire/externalsortinginjava/actions/workflows/java8.yml)\n\nExternal-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm.\n\n\n\nThis code is used in [Apache Jackrabbit Oak](https://github.com/apache/jackrabbit-oak) as well as in [Apache Beam](https://github.com/apache/beam) and in [Spotify scio](https://github.com/spotify/scio).\n\nCode sample\n------------\n\n```java\nimport com.google.code.externalsorting.ExternalSort;\n\n//... inputfile: input file name\n//... outputfile: output file name\n// next command sorts the lines from inputfile to outputfile\nint numLinesWritten = ExternalSort.mergeSortedFiles(ExternalSort.sortInBatch(new File(inputfile)), new File(outputfile));\n// you can also provide a custom string comparator, see API\n```\n\n\nCode sample (CSV)\n------------\n\nFor sorting CSV files, it  might be more convenient to use `CsvExternalSort`.\n\n```java\nimport com.google.code.externalsorting.CsvExternalSort;\nimport com.google.code.externalsorting.CsvSortOptions;\n\n// provide a comparator\nComparator\u003cCSVRecord\u003e comparator = (op1, op2) -\u003e op1.get(0).compareTo(op2.get(0));\n//... inputfile: input file name\n//... outputfile: output file name\n//...provide sort options\nCsvSortOptions sortOptions = new CsvSortOptions\n\t\t\t\t.Builder(comparator, CsvExternalSort.DEFAULTMAXTEMPFILES, CsvExternalSort.estimateAvailableMemory())\n\t\t\t\t.charset(Charset.defaultCharset())\n\t\t\t\t.distinct(false)\n\t\t\t\t.numHeader(1)\n\t\t\t\t.skipHeader(false)\n\t\t\t\t.format(CSVFormat.DEFAULT)\n\t\t\t\t.build();\n// container to store the header lines\nArrayList\u003cCSVRecord\u003e header = new ArrayList\u003cCSVRecord\u003e();\n\n// next two lines sort the lines from inputfile to outputfile\nList\u003cFile\u003e sortInBatch = CsvExternalSort.sortInBatch(file, null, sortOptions, header);\n// at this point you can access header if you'd like.\nint numWrittenLines = CsvExternalSort.mergeSortedFiles(sortInBatch, outputfile, sortOptions, true, header);\n\n```\n\nThe `numHeader` parameter is the number of lines of headers in the CSV files (typically 1 or 0) and the `skipHeader` parameter indicates whether you would like to exclude these lines from the parsing.\n\nAPI Documentation\n-----------------\n\nhttp://www.javadoc.io/doc/com.google.code.externalsortinginjava/externalsortinginjava/\n\n\n\n\nMaven dependency\n-----------------\n\n\nYou can download the jar files from the Maven central repository:\nhttps://repo1.maven.org/maven2/com/google/code/externalsortinginjava/externalsortinginjava/\n\nYou can also specify the dependency in the Maven \"pom.xml\" file:\n\n```xml\n    \u003cdependencies\u003e\n         \u003cdependency\u003e\n\t     \u003cgroupId\u003ecom.google.code.externalsortinginjava\u003c/groupId\u003e\n\t     \u003cartifactId\u003eexternalsortinginjava\u003c/artifactId\u003e\n\t     \u003cversion\u003e[0.6.0,)\u003c/version\u003e\n         \u003c/dependency\u003e\n     \u003c/dependencies\u003e\n```\n\nHow to build\n-----------------\n\n- get the java jdk\n- Install Maven 2\n- mvn install - builds jar (requires signing)\n- or mvn package - builds jar (does not require signing)\n- mvn test - runs tests\n\n\n\n[maven img]:https://maven-badges.herokuapp.com/maven-central/com.googlecode.javaewah/JavaEWAH/badge.svg\n[maven]:http://search.maven.org/#search%7Cga%7C1%7Cexternalsortinginjava\n\n[license]:LICENSE.txt\n[license img]:https://img.shields.io/badge/License-Apache%202-blue.svg\n\n\n[docs-badge]:https://img.shields.io/badge/API-docs-blue.svg?style=flat-square\n[docs]:http://www.javadoc.io/doc/com.google.code.externalsortinginjava/externalsortinginjava/\n","funding_links":[],"categories":["Memory and concurrency"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemire%2Fexternalsortinginjava","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flemire%2Fexternalsortinginjava","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemire%2Fexternalsortinginjava/lists"}