Zet 1.0 is out (compare to uniq and comm)

Zet 1.0 is out (crates.io, github) with binary downloads available for Linux, Windows, and Mac (Intel and Apple Silicon).

zet union is comparable to uniq, but its input need not be sorted, its output will preserve the order of the input lines, and it can take multiple input files. It's about 4.5 times faster than uniq. Similarly, zet single and zet multiple are comparable to uniq -u and uniq -d. And zet intersect and zet diff compare to comm -12 and comm -23, can take more than two (unsorted) input files, respect input order, and are about 10 times faster than comm. (Speed comparisons done on a 2020 Macbook Air M1.)

On the other hand, zet takes at least as much RAM as the size of its first input file, and in the worst case the sum of the sizes of all input files. So it's faster, but will stop working when files get very big.

The only previous version of zet that I posted about was version 0.2.0. Version 1.0 is 1.5 to 2.4 times faster than 0.2.0, depending on the exact subcommand. It adds the --count, --count-lines, and --count-files flags.

A breaking change: in previous versions, zet single output lines that occurred in just one file, even if that file had multiple copies of the line, and zet multiple output only lines that occurred in more than one file (and not lines that occurred multiple times in one file). That behavior is available in version 1.0 as zet single --file and zet multiple --files, while zet single in version 1.0 prints lines that occur exactly once in the entire input and zet multiple prints lines that occur more than once, even if all copies occur in a single input file.

3 Likes

Neat! Have you considered submitting this to https://github.com/sts10/rust-command-line-utilities? (No affiliation.)

2 Likes

Thanks for the tip!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.