Zet 1.0 is out (crates.io, github) with binary downloads available for Linux, Windows, and Mac (Intel and Apple Silicon).
zet union
is comparable to uniq
, but its input need not be sorted, its output will preserve the order of the input lines, and it can take multiple input files. It's about 4.5 times faster than uniq
. Similarly, zet single
and zet multiple
are comparable to uniq -u
and uniq -d
. And zet intersect
and zet diff
compare to comm -12
and comm -23
, can take more than two (unsorted) input files, respect input order, and are about 10 times faster than comm
. (Speed comparisons done on a 2020 Macbook Air M1.)
On the other hand, zet
takes at least as much RAM as the size of its first input file, and in the worst case the sum of the sizes of all input files. So it's faster, but will stop working when files get very big.
The only previous version of zet
that I posted about was version 0.2.0. Version 1.0 is 1.5 to 2.4 times faster than 0.2.0, depending on the exact subcommand. It adds the --count
, --count-lines
, and --count-files flags
.
A breaking change: in previous versions, zet single
output lines that occurred in just one file, even if that file had multiple copies of the line, and zet multiple
output only lines that occurred in more than one file (and not lines that occurred multiple times in one file). That behavior is available in version 1.0 as zet single --file
and zet multiple --files
, while zet single
in version 1.0 prints lines that occur exactly once in the entire input and zet multiple
prints lines that occur more than once, even if all copies occur in a single input file.