Hi all,
Zet (crates.io, Github) is a command-line utility for doing set operations on files considered as sets of lines. For instance, zet union x y z
outputs the lines that occur in any of x
, y
, or z
, and zet intersect x y z
those that occur in all of them.
Here are the subcommands of zet
and what they do:
-
zet union x y z
outputs the lines that occur in any ofx
,y
, orz
. -
zet intersect x y z
outputs the lines that occur in all ofx
,y
, andz
. -
zet diff x y z
outputs the lines that occur inx
but not iny
orz
. -
zet single x y z
outputs the lines that occur in exactly one ofx
,y
, orz
. -
zet multiple x y z
outputs the lines that occur in two or more ofx
,y
, andz
.
Zet handles UTF-16 files, so should work OK on Windows. You can install with cargo install zet
, or the Github release page has binaries for Linux, Mac, and Windows.
Notes
- Each output line occurs only once, because we're treating the files as sets and the lines as their elements.
- We do take the file structure into account in one respect: the lines are output in the same order as they are encountered. So
zet union x
prints out the lines ofx
, in order, with duplicates removed. - Zet translates UTF-16LE and UTF-16BE files to UTF-8, and ignores Byte Order Marks (BOMs) when comparing lines. It prepends a BOM to its output if and only if its first file argument begins with a BOM.
- Zet ignores all lines endings (
\r\n
or\n
) when comparing lines, so two input lines compare the same if their only difference is that one ends in\r\n
and the other in\r
. Zet ends each output line with\r\n
if the first line of its first file argument ends in\r\n
, and\n
otherwise (if the first line ends in\n
or the first file has only one line and that line has no line terminator.) - Zet reads entire files into memory. Its memory usage is roughly proportional to the file size of its largest argument plus the size of the (eventual) output.