Cargo and line endings on Windows

the whole "text mode vs binary mode" concept is really a C concept, it's not an OS thing. due to the prevalence of the C programming language, it's everywhere now, and that's an unfortunate thing, and the worst of part of it, is that "text" mode is the default if "b" is not explicitly specified.

fortunately, Rust decided not to carry the baggage over: there's no "text" mode (see std::fs::OpenOptions), a file is an opaue sequence of bytes (not characters), that's it. how to interpret the the bytes are up to the program, and the standard library (like trait Display {}, format!(), println!() etc) uses str type to represent text messages and \n (code point encoded as 0x0A utf8 code unit) to represent EOL.

and, even for the C/C++, "text mode" is problematic. the standard is intentionally being vague about how "text mode" should exactly behave, and provides very weak guarantee. even if your program is processing text, you'd better use binary mode and parse the input yourself, e.g. if you want to handle non-ascii. C doens't even guarentee the signed

it is always said that Unix is written in C, yet Unix desn't have a system call to open a file with "text" mode (see creat(2), open(2), etc). actually, Windows doesn't have a "text" mode to open a file (see NtOpenFile, NtCreateFile), even DOS doesn't have a "text" mode (see int 21H 3CH, int 21H 3DH, although DOS does have special char IO syscalls, but they are specific to console, not regular files).


some anecdotes:

nowadays "text mode" is mostly known to be translating line endings between DOS/MacOS/*Nix operating systems, it dated back before DOS or MacOS! I might be wrong, but I believe the original motivation was that, text files are stored very differently on different machines, there might not be EOLs at all.

e.g. some machines store text file as sequence of fixed length lines, padded with white spaces (maybe that's why the C standard says the trailing spaces might not be preserved in text mode); some machine prefix lines with line length or line number markers. certain programming languages (COBOL) even baked semantics into column positions of the code. it might have something to do with punch cards, but I don't know for sure.

back then, line editing was the norm, it was impossible, or at least impractical to load files into memory as a whole. even the revolutionary "fullscreen editor" vi actually operates on a line basis. ancient text editors call opened file as "buffers", people today might argue the term is proper or confusing, but original the meaning of buffers were different: they were really temporary buffers, not just copys of opened file contents.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.