When is a line not a line? BufReader and the 'wc' utility


#1

My toy program to count lines in a file uses this expression

BufReader::new(source).lines().count()

which returns a number that in general differs from the result of ‘wc -l’. For example

echo -en "line1\nline2" > example.txt

My program counts 2 lines, wc counts 1. Is this expected?

Thanks


#2

man wc:

wc - print newline, word, and byte counts for each file

Here is your answer. Wc does not count lines.


#3

There’s also another viewpoint. POSIX defines a “line” as something that ends with a newline (see here, found via this StackOverflow question), and I assume that that definition is related to, if not based on the Unix convention of always ending text files with a newline. Under that definition, counting newlines is also counting lines.


#4

Thanks for replies. On one side we have ‘wc’ and POSIX, on the other BufReader.

Since posting my question I discovered rust corutils which conforms to wc and suggests that BufReader’s different notion of a line is not regarded as a bug.


#5

Seems to depend on the system:

WC(1)                     BSD General Commands Manual                    WC(1)

NAME
     wc -- word, line, character, and byte count

EDIT: I don’t mean that the result depends on the system, merely the wording in the man page.


#6

wc --help

Usage: wc [OPTION]… [FILE]…
or: wc [OPTION]… --files0-from=F
Print newline, word, and byte counts for each FILE

There are many wc implementations though, so yours may be different.