{"id":21367943,"url":"https://github.com/zkxs/cuniq","last_synced_at":"2025-08-08T18:55:45.634Z","repository":{"id":252613794,"uuid":"840944387","full_name":"zkxs/cuniq","owner":"zkxs","description":"Command line tool that counts unique lines FAST.","archived":false,"fork":false,"pushed_at":"2025-07-01T11:04:32.000Z","size":18688,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-13T05:39:27.810Z","etag":null,"topics":["blazingly-fast","cardinality","cli","cli-app","command-line","command-line-tool","count","distinct","unique"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zkxs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-11T07:03:52.000Z","updated_at":"2025-06-30T23:57:37.000Z","dependencies_parsed_at":"2025-07-01T00:31:43.182Z","dependency_job_id":null,"html_url":"https://github.com/zkxs/cuniq","commit_stats":null,"previous_names":["zkxs/cuniq"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/zkxs/cuniq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkxs%2Fcuniq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkxs%2Fcuniq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkxs%2Fcuniq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkxs%2Fcuniq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zkxs","download_url":"https://codeload.github.com/zkxs/cuniq/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkxs%2Fcuniq/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267612873,"owners_count":24115540,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blazingly-fast","cardinality","cli","cli-app","command-line","command-line-tool","count","distinct","unique"],"created_at":"2024-11-22T07:21:43.316Z","updated_at":"2025-07-29T01:05:31.837Z","avatar_url":"https://github.com/zkxs.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cuniq\n\n**The pitch**: cuniq is a dedicated command line tool for counting unique lines in text input. If you find yourself\nfrequently running commands like `sort -u | wc -l` or `sort | uniq -c` you will find improved performance by using cuniq\ninstead.\n\n**The anti-pitch**: For small inputs you're fine using sort and uniq, as we're talking millisecond-savings by switching\nto cuniq. However, if you've been using `sort | uniq | wc -l` you should switch to `sort -u | wc -l`, as it's free\nperformance gain without having to go outside standard POSIX commands.\n\n## Performance\n\ncuniq has been benchmarked against various combinations of GNU coreutils (sort, uniq, and wc) as well as other\nhashing-based Rust utilities [runiq](https://crates.io/crates/runiq), [sortuniq](https://crates.io/crates/sortuniq),\nand [huniq](https://crates.io/crates/huniq).\nAs of this writing, you should not use runiq 2.0.0 or sortuniq 0.2.0 for counting unique lines: they underperform cuniq\nin all cases, and in many cases their performance is on par with or even worse than `sort -u | wc -l`.\n\nFor **counting** cuniq reliably outperforms GNU sort in all cases.\n\nFor **reporting line occurrence counts** cuniq reliably outperforms GNU uniq in all cases except one:\n\n\u003e [!NOTE]\n\u003e If your input has extremely few duplicates and you want a sorted report, than you're better off using `sort | uniq -c`.\n\u003e This is because with extremely few duplicates both approaches must sort nearly all of the input, but cuniq also wastes\n\u003e time building a hash table.\n\nMore data and technical details on the benchmarking and profile-guided optimization that went into creating cuniq are\navailable in [PERFORMANCE.md](PERFORMANCE.md).\n\n## Compatibility\n\ncuniq has compatible output with corresponding GNU coreutils commands:\n\n| GNU coreutils command   | cuniq equivalent | Effect                                | Notes                                                        |\n|-------------------------|------------------|---------------------------------------|--------------------------------------------------------------|\n| `sort \\| uniq \\| wc -l` | `cuniq`          | Count of unique lines                 |                                                              |\n| `sort -u \\| wc -l`      | `cuniq`          | Count of unique lines                 | this GNU coreutils command is more performant than the above |\n| `sort \\| uniq -c`       | `cuniq -c`       | Unsorted report of unique line counts | output order differs between the two commands                |\n| `sort \\| uniq -c`       | `cuniq -cs`      | Sorted report of unique line counts   |                                                              |\n\n## Install\n\n### Installing from Source\n\nFirstly, [Install Rust](https://www.rust-lang.org/tools/install).\n\nInstall from crates.io:\n`RUSTFLAGS=\"-C target-cpu=native\" cargo install cuniq`\n\nAlternatively, install from GitHub:\n`RUSTFLAGS=\"-C target-cpu=native\" cargo install --git=https://github.com/zkxs/cuniq`\n\n### Manual Installation\n\nDownload cuniq from the [latest release](https://github.com/zkxs/cuniq/releases/latest), and save it to a location of your choice\n\n### Install from AUR (on Arch Linux)\n```\nyay -S cuniq\n```\n\n## Usage\n\ncuniq can accept lines from stdin or from a list of files.\n\n```\nUsage: cuniq [OPTIONS] [FILES]...\n\nArguments:\n  [FILES]...\n          Files to process\n\nOptions:\n  -c, --report\n          Instead of printing total unique lines, print a report showing occurrence count of each\n          line. This is only compatible with \"exact\" mode (the default)\n\n  -s, --sort\n          Sort report output alphabetically by line. Has no effect unless used with `--report`\n\n  -t, --trim\n          Remove leading and trailing whitespace from input\n\n  -l, --lower\n          Convert input to lowercase\n\n  -m, --mode \u003cMODE\u003e\n          Sets the algorithm used to count (or estimate) cardinality\n\n          [default: exact]\n\n          Possible values:\n          - exact:      Uses a hash table to exactly count cardinality. The size of the hash table\n            is proportional to the cardinality of the input. You may use the `--size` flag to set\n            the initial capacity of the internal hash table. For very large inputs `--size` may help\n            reduce expensive hash table reallocations. Avoid setting `--size` for small datasets\n          - near-exact: Uses a hash table to exactly count cardinality, but does not store the\n            original line. This mode is faster than \"exact\" mode, but hash collision will result in\n            under-counting the cardinality by one. However, hash collisions for a 64-bit hash are\n            exceedingly unlikely. The size of the hash table is proportional to the cardinality of\n            the input. You may use the `--size` flag to set the initial capacity of the internal\n            hash table. For very large inputs `--size` may help reduce expensive hash table\n            reallocations. Avoid setting `--size` for small datasets. This mode is not compatible\n            with `--report`\n          - estimate:   Uses the HyperLogLog algorithm to estimate cardinality with fixed memory.\n            Use the `--size` flag to specify the number of 1-byte registers to use. More registers\n            will increase estimate accuracy. By default, 65536 is used. This mode is not compatible\n            with `--report`\n\n  -n, --size \u003cSIZE\u003e\n          Set the size used by the selected counting mode. See the `--mode` documentation for how\n          this affects each counting mode\n\n      --threads \u003cTHREADS\u003e\n          Set the number of threads used to perform the count. By default, the number of logical\n          cores is used. Not all counting modes support parallelism: see `--mode` for details\n\n      --no-stdin\n          Disable checking stdin for input. May yield a small performance improvement when only\n          reading input from files\n\n      --memmap\n          Force reading files via memmap. This may yield improved performance for large files. If\n          the binary was built without memmap support, using this flag will result in an error\n\n      --no-memmap\n          Disable reading files via memmap, instead falling back to normal reads. By default, cuniq\n          will try to use memmap if it thinks it will be faster. Disabling memmap may yield improved\n          performance for small files\n\n  -h, --help\n          Print help (see a summary with '-h')\n\n  -V, --version\n          Print version\n```\n\n## License\n\ncuniq is free software: you can redistribute it and/or modify it under the terms of the\n[GNU General Public License](LICENSE) as published by the Free Software Foundation, either version 3 of the\nLicense, or (at your option) any later version.\n\ncuniq is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the [GNU General Public License](LICENSE) for more\ndetails.\n\nA full list of dependencies is available in [Cargo.toml](cuniq/Cargo.toml), or a breakdown of dependencies by license can be\ngenerated with `cargo deny list`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkxs%2Fcuniq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzkxs%2Fcuniq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkxs%2Fcuniq/lists"}