Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wroberts/count
UNIX line counting utilities
https://github.com/wroberts/count
c-plus-plus count counting-utilities line-by-line sort text-processing unix
Last synced: about 1 month ago
JSON representation
UNIX line counting utilities
- Host: GitHub
- URL: https://github.com/wroberts/count
- Owner: wroberts
- License: mit
- Created: 2014-11-27T14:27:01.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2018-08-04T12:04:32.000Z (over 6 years ago)
- Last Synced: 2023-03-11T05:49:39.466Z (almost 2 years ago)
- Topics: c-plus-plus, count, counting-utilities, line-by-line, sort, text-processing, unix
- Language: C++
- Homepage:
- Size: 36.1 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
count - UNIX line counting utilities
====================================Copyright (c) 2014 Will Roberts \
Homepage: https://github.com/wroberts/count
This project is licensed under the terms of the MIT license (see
LICENSE.md).Overview
--------`count` works similarly to `sort fruit | uniq -c`. The output is
tab-separated and in alphabetical order.`addcount` sums two count files produced by `count`, assuming that the
files are sorted in alphabetical order.`sortalph` takes count data as produced by `count` and sorts it
alphabetically; it can also be used to sum two (or more) count files
together (even if they're not in alphabetical order):`cat COUNT1 COUNT2 | sortalph`
`sortnum` is a script that calls `sort -nr`.
`threshcount` reads a count file as produced by `count` and outputs
only those lines whose counts are greater than the given threshold
argument.`shuffle` is a short Python script which reads in a file and outputs
its lines in random order. `shuf` in the
[GNU Coreutils](https://www.gnu.org/software/coreutils/) is faster and
more flexible.Install
-------From tarball:
tar xf count-1.0.tar.gz
cd count-1.0/
./configure
make installFrom github:
autoreconf --install
mkdir build
cd build
../configure
make installSpeed Test
----------`count` is faster than `sort | uniq -c`, but can use much more memory:
$ cat BIGFILE | wc
1653677 21751482 75598346$ time (cat BIGFILE | sort | uniq -c > /dev/null)
real 0m50.933s
user 0m55.267s
sys 0m0.347s$ time (cat BIGFILE | count > /dev/null)
real 0m9.233s
user 0m9.357s
sys 0m0.453sAwk Equivalents
---------------Most of the `count` tools can be replicated with trivial `awk` scripts.
Usually, the compiled binaries are faster.`count` is equivalent to, though faster than:
awk '{c[$0]++} END {OFS="\t"; for (x in c) print c[x], x}' | sort -k2
`sortalph` is equivalent to, though faster than:
awk 'BEGIN{FS=OFS="\t"} {v=$1; $1=""; c[substr($0,2)]+=v} END {for (x in c) print c[x], x}' | sort -k2
`threshcount 2` is equivalent to, but slower than:
awk '{if (2 < $1) print $0}'