{"id":18575931,"url":"https://github.com/datafabricrus/textfile-utils","last_synced_at":"2025-04-10T08:30:58.965Z","repository":{"id":176865660,"uuid":"659662963","full_name":"DataFabricRus/textfile-utils","owner":"DataFabricRus","description":"A simple JVM library with utilitarian methods for working with text files of any size, including merge sorting and binary search. The library is based on the Java NIO and Kotlin coroutines.","archived":false,"fork":false,"pushed_at":"2024-12-20T08:14:43.000Z","size":531,"stargazers_count":3,"open_issues_count":5,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-24T18:52:32.169Z","etag":null,"topics":["binary-search","file-utility","kotlin","kotlin-library","mergesort","text-files"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataFabricRus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-28T09:46:39.000Z","updated_at":"2024-12-19T07:44:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"2d31c7fb-44a5-4cbe-a6ba-80f818865aa5","html_url":"https://github.com/DataFabricRus/textfile-utils","commit_stats":null,"previous_names":["datafabricrus/textfile-utils"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFabricRus%2Ftextfile-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFabricRus%2Ftextfile-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFabricRus%2Ftextfile-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFabricRus%2Ftextfile-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataFabricRus","download_url":"https://codeload.github.com/DataFabricRus/textfile-utils/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248185289,"owners_count":21061491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-search","file-utility","kotlin","kotlin-library","mergesort","text-files"],"created_at":"2024-11-06T23:22:51.417Z","updated_at":"2025-04-10T08:30:58.943Z","avatar_url":"https://github.com/DataFabricRus.png","language":"Kotlin","readme":"## TextFile Utils\n\nA simple JVM library for working with text files of any size.\nThe library is based on Java NIO (i.e. `java.nio.channels.SeekableByteChannel`) and Kotlin Coroutines.\n\nThe library allows sorting an arbitrary file that can be divided into byte blocks by some delimiter.\nAfter sorting, these blocks can be found using a binary search algorithm.\nThe library takes care of memory consumption, performance and diskspace, so it is suitable for environments with limited resources.\nLarge CSV-files are one example where this library could be used.\nThe library is lightweight, so it can be used if there is no possibility to use heavy frameworks or databases.\n\nContains the following utils:\n\n- insertion at an arbitrary position in the file\n- reading text lines from the end or start of the file\n- files merging\n- determining if a file is sorted\n- invert file content\n- sorting large text files with memory O(1) and no additional diskspace (optionally)\n- binary search in sorted file\n\n#### MergeSort:\n```kotlin\nfun sort(\n    source: Path,                       // existing regular file\n    target: Path,                       // result file, must not exist\n    comparator: Comparator\u003cString\u003e,     // to compare lines, by default lexicographically\n    delimiter: String,                  // default: `\\n`\n    allocatedMemorySizeInBytes: Int,    // the approximate allowed memory consumption\n    controlDiskspace: Boolean,          // if `true` source file will be truncated while process\n    charset: Charset,                   // default: UTF8\n    coroutineContext: CoroutineContext, // default: Dispatchers.IO\n)\n```\n#### BinarySearch:\n```kotlin\nfun binarySearch(\n    source: Path,                       // existing regular file\n    searchLine: String,                 // pattern to search\n    buffer: ByteBuffer,                 // to be used while reading data from file\n    charset: Charset,                   // default: UTF8   \n    delimiter: String,                  // default: `\\n`\n    comparator: Comparator\u003cString\u003e,     // to compare lines, by default lexicographically\n    maxOfLinesPerBlock: Int,            // maximum number of lines in a paragraph \n    maxLineLengthInBytes: Int,          // maximum length of line\n): Pair\u003cLong, List\u003cString\u003e\u003e\n```\n\n#### Available via maven-central:\n```kotlin\ndependencies {\n    implementation(\"io.github.datafabricrus:textfile-utils:{{latest_version}}\")\n}\n```\n\n### Apache License Version 2.0","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafabricrus%2Ftextfile-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafabricrus%2Ftextfile-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafabricrus%2Ftextfile-utils/lists"}