https://github.com/donatj/unic
Like UNIX `sort | uniq` except it's quicker and maintains order. Uses a Cuckoo Filter.
https://github.com/donatj/unic
command-line-tool unique
Last synced: 12 months ago
JSON representation
Like UNIX `sort | uniq` except it's quicker and maintains order. Uses a Cuckoo Filter.
- Host: GitHub
- URL: https://github.com/donatj/unic
- Owner: donatj
- License: mit
- Created: 2017-11-14T21:05:24.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-12-07T14:19:17.000Z (over 2 years ago)
- Last Synced: 2025-04-23T09:41:15.518Z (about 1 year ago)
- Topics: command-line-tool, unique
- Language: Go
- Homepage:
- Size: 50.8 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# unic
[](https://goreportcard.com/report/github.com/donatj/unic)
[](https://godoc.org/github.com/donatj/unic)
Works like UNIX `sort | uniq` to provide global uniques except you don't have to sort first.
Works by using Cuckoo Filters - See: https://github.com/seiflotfy/cuckoofilter
## Advantages over `sort | uniq`
### Quicker output, lower memory footprint
`sort` by definitions needs to buffer the entire input before it can begin outputing **anything**. This can use a lot of memory and prevents anything from getting output until the initial process completes.
`unic` uses probabalistic filters (Cuckoo) to determine if the input has been seen before, and can begin output after the first line of input.
### Original item order is kept
Given the list `3 1 2 1 2 3`, compare `sort | uniq` 's output
```bash
$ echo '3\n1\n2\n1\n2\n3' | sort | uniq
1
2
3
```
to `unic`
```bash
echo '3\n1\n2\n1\n2\n3' | unic
3
1
2
```
## Disadvantages
### Probabilistic Filtering
As `unic` works with Cuckoo Filters, there is a very small probability a line will be wrongly marked duplicate. Lines will **never** be incorrectly marked as unique due to the nature of the filter.
In cases where a false positive cannot ever be tolerated, `unic` **should not** be used.
### Not compatible with all of `uniq`'s flags
`unic` by nature does not buffer; thus some of `uniq`'s flags cannot be implemented.
In these cases, you should use `uniq`.
## Installing
### Binaries
See: [releases](https://github.com/donatj/unic/releases)
### From Source
```bash
$ go install github.com/donatj/unic/cmd/unic@latest
```