Cargo and line endings on Windows

I am looking into the behavior of line endings for several tools. I would like to eliminate the possibility of large diffs due to line ending changes. These can lead to merge conflicts, the hiding of important changes and they reduce the effectivity of git blame.

While working on Windows, I expected cargo init to create a Rust project with Windows style carriage return '\r' (CR) line-feed '\n' (LF) line endings. However, it seems like the generated files have Unix style LF line endings.

GitForWindows by default sets core.autocrlf = true on installation. This setting makes git rewrite line endings on check out to the native line ending style (CRLF on Windows) and back to LF on check-in. If you generate a new project and add it to the repository, you are presented with the following:

warning: in the working copy of '.gitignore', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'Cargo.lock', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'Cargo.toml', LF will be replaced by CRLF the next time Git touches it
warning: in the working copy of 'src/main.rs', LF will be replaced by CRLF the next time Git touches it

Git is telling us that when it checks out those files, they will have CRLF line endings instead of the current LF line endings. Fortunately, the rust tools do work irrespective of the line ending style and so this is not a functional issue. However, the warning message can confuse developers who have never had to look into this topic. We can configure git in the repository not to rewrite line endings for these files, or to use a specific line ending style for certain files through .gitattributes.

One could argue that a version control system (VCS) is responsible for versioning files, not for modifying them. However, I believe that VCSs are responsible for enabling collaboration. If that means rewriting text files for maximum tool compatibility under certain operating systems, then it should do that.

I am surprised that this is the current state of things. I do not think that it is wise to go against the operating system's native line ending style if we can avoid it. If we do so, we are invalidating the default configuration of most tools on that platform. A non-default configuration then needs to be added for the tools that a team is using in the form of .gitattributes, .vscode/settings.json, and so forth.

I found some issues in the cargo issue tracker that are relevant. However I did not find anything on why the current implementation of cargo uses LF only.

Can anyone shed some light on this topic?

We have had this madness with line endings for decades and solution is always the same: if magic doesn't work then stop using it, not add more and more magic, which wouldn't work, either.

FTP started as “helpful” protocol which was mangling binaries and archives and then most clients starting using binary mode by default and HTTP, finally, doesn't try to mangle files.

If Git for Windows developers decided that they want to use crippling autocrlf = true then we couldn't do anything about it, but this still is not the reason to try to deal with that madness by adding more and more unstable magic.

If cargo new would add apropriate configuration to git repo to disable the magic then it would be the best solution IMO.

But as someone who doesn't like to fight my tools for no good reason I, of course, never enable autocrlf = true thus it's not a big deal for me.

Yup. And that's good thing. Just like refusal to support anything but UTF-8 eventually made even Microsoft realize they need to support it refusal to support anything but Unix end of lines would lead to change in most tools on that platform. E.g. Notepad support UNIX line endings and so should do all other tools, too.

It doesn't work, sorry. The only way to have a tool that you may trust is to reduce abuse of “unstable magic”. If something “magical” can be made to work 100% or, at least, in case of failure you have some recoverable issues then such magic is helpful. If, like autocrlf = true, magic is neither reliable nor [easily] recoverable then the best way is to not play with it, but to disable it.

Thank you for sharing your perspective on the matter. I am all for standardization and every tool using LF on every platform would be simpler. This would require commonly used tools to use LF line endings as the default on Windows. Perhaps what you're saying is that cargo should be the one to take initiative.

I like the suggestion for cargo init to create a .gitattributes file. It will require some discussion to establish what it should contain. We could force git to check out and in with LF line endings:

*.rs       text eol=lf
Cargo.toml text eol=lf
Cargo.lock text eol=lf
.gitignore text eol=lf

Or we could tell git not to consider those files as text files that need their line endings rewritten with

*.rs       -text
Cargo.toml -text
Cargo.lock -text
.gitignore -text

The last version of .gitattributes then does require Windows users to configure their tools to create files with LF line endings and not to change LF line endings to CRLF when modifying them. I would say forcing the rewriting with git is still superior because it requires less configuration.

Your response does seem almost emotional and I am not sure what you mean by magic. What is magical about core.autocrlf = true? The heuristic to determine what files should be considered text files?

Yes. It just never works.

Files are files, they are sequence of bytes and there are no need to make them complicated. While aspiration of people who invented that distinction between binary and text files were noble by now it's obvious then it causes much more harm than good.

Every single time someone tried to add this magical conversion to the mix an attempt ended up with a mess, sooner or later.

No. This only require one to have the appropriate settings. And they are there.

Then you have to instruction people to use sane Git settings (like here, e.g.) and that's it. You setup your Visual Studio Code or CLion correctly and don't even think about any magic needed to support different types of files.

Yes, people who insist on using ancient tool would suffer, but I would rather keep them thinking about how they make life easier for themselves rather than everyone else deal with issue that's entirely artifical.

You would need to discuss that with people who insist on usage of autocrlf=true option. I'm not one of them thus I don't know or care much about binary vs text distinction.

As long as everything works correctly when that distinction is disabled I'm happy.

It is not perfect, it is a heuristic. The functionality was added to git to enable collaboration between people who used tools of which some required CRLF and others required LF. It is a suboptimal solution but sometimes we can't expect the whole world to immediately change. There is nothing "magical" about core.autocrlf.

I was referring to what needs to happen to arrive in a world where all platforms and tools use LF and CRLF is a thing of the past. Surely you agree that we will not get there by writing blog posts about how everyone should use a modified configuration, it's the defaults that need to change.

I have no control over the global configuration of the developers I work with, nor do I think I should. Some developers may need to work on multiple projects. If those projects were to require different global settings it won't work.

Yup. That's the correct choice.

First thing I do is globally turn that off. That choice has served me very well for many years.

Neither operating system cares. It's the tools that live there that care. Windows tools, at least the plethora of ones I use, don't care. Even Notepad now (for five years) supports line feed endings.

What I am interested in is the motivation.

Then your editor also needs to not create files with CRLF line endings. To give you some context. I'm working with 500+ developers. Some of the tools they use will default to CRLF on Windows.

You are correct. I did not phrase that as well as I could have indeed. I was referring to the default line ending style that tools on Windows default to. It is a bit long to type so I incorrectly shortened it to the "operating system's".

Motivation is simple: your “source of truth” tools, like your file transfer tools shouldn't lose information without warnings. This:

Is just not acceptable if result of said heuristics failure is not some harmless error message or something similar, but irreparably lost information.

Nope. People need to change how they do things and then defaults would change.

Chromium is in the exact same position. And yet they always used autocrlf=false.

It's really the only way to change things: you tell people how they are supposed to work and refuse to support them if they ignore your instructions. If they want to use your tools then it's fine, it's a free world, after all. But making life miserable for a people who are not playing with autocrlf=true fire is bad choice.

Support people who are not doing that, if you want, but that's optional.

That's how Windows got support for UTF-8, how Notepad got support for Unix ends of lines and so on: people just refused to support anything else and recommendation was to just use Linux if they can not use tools that produce Unix files. It's easy these days.

3 Likes

The pain I've personally endured by choosing poorly. The pain I've personally endured by my coworkers choosing poorly when I have to maintain their work. I'm highly motivated.

A problem solved with a Git hook.

You can additionaly check .editorconfig into your repo, it's respected by more than one editor and for others there are plugins.

Additionally, that's not even a good argument? If someone had an editor that used the vertical tab character to encode line breaks then you'd also say the issue is with the editor, not with the code repo and tell them to change the configuration.

If it's a common enough problem you can add a CI hook to check for CRLFs.

2 Likes

What I find somewhat amusing is that now different people are suggesting that I have everyone configure their editor in some way, through git hooks, global configuration, .editorconfig, plugins, ...

This is exactly why I think letting git rewrite line endings and playing nice with the prevailing default line ending on a certain platform isn't actually that bad of an idea. Many tools will use native line endings by default, and then we only need to configure git to convert the line endings for certain files. The places where configuration needs to happen is limited to git. The number of tools that developers within a large organisation might use is pretty much unbounded.

Indeed, an automated check is often also a necessary part of the solution. However, doing so is only one part of the equation. The other part is to make development as easy as possible. That is the part that I would like to focus on.

Choose the correct editor setting is a one time thing (per developer). When autocrlf = true causes bug or unwanted behavior, that's where the actual repeating pain lies.

2 Likes

The tool responsible for handling syntactic differences in source text is the text editor/IDE, not the VCS. The VCS better stay away from touching my files in a way I didn't ask for. Simultaneously, my text editor should display lines correctly and add the appropriate line endings no matter if they are \n, \r\n, or \r.

Could you help me understand how one would lose information without warnings? Is there even a way to lose information? What would the reproduction steps be? I am genuinely interested because if true, this would be a very good argument indeed.

That's not how main instead of master would ever become a thing (irrespective of whether that was a fruitful endeavor). I believe that the tools would have to drive this, resembling policy makers.

That is a fantastic argument of authority. Unfortunately I can't find the rationale on the page you linked to.

I am still looking for documentation of the rationale for this choice by the Cargo team. People's personal perspectives are of course welcome but not the main point of this thread.

Is it necessary to document something that works correctly on all three major operating systems? To me documenting such things seems a waste of time.

It's not “bad” idea. It's awful idea. It breaks the fundamental assumption: the state of git repo and state of what you have on disk after “git reset” have 1-to-1 correspondence.

Now, suddenly, it's possible to have something checked out that actually works but when someone else tries to use that some file — content is different and nothing works.

In the end we have strange conundrum where autocrlf=true is either useless or worse:

  1. If your tools work just fine with both CRLF and LF line ends then autocrlf=true is not needed.
  2. If your tools don't work with LF line ends then it's actively dangerous because you don't know whether putting file into repository and then taking it back would give you something useful or not.

Maybe for some language like PHP which applies heuristics on top of heuristic to fix other heuristics and ends with decision to say that strings "1e3" and "1000" are equal autocrlf=true would be useful.

But Rust prides itself on the fact that it tries to solve problems, not shove them under the carpet. For Rust, specifically, autocrlf=true is not an option and if you want to play with it then you are on your own.

Wrong. I have already mentionsed WSL. I have a friend who is doing internship and uses WSL to edit files pulled from Android repo in his WSL installation.

He uses VSC from Windows side and doesn't even know that there are some kind of difference in line endings on Windows and Linux. VSC just detects Unix file and does the right thing.

And that means that havoc they may wreak is pretty much unbounded too.

Yes. And to make that possible you have to declare that you are not going to support the whole zoo.

People often praise ease of development for Rust. Do you know why? Because cargo doesn't support all the bazillion development tools that people built around C and C++. And documentation is generated by one tool. And so on.

Every time you make something “more flexible” you make it more complex, too.

You cry about these strange tools that couldn't handle Unix files, but can you show us even one example of such a tool which actually exist on Unix and can process LF end of line endings there, but not on Window?

I'm not sure such tools don't exist, but they are so rare that I don't even know a single example in an extra-rare case where they are used simple wrapper script would, usually, save you.

If your tool only exist on Windows then autocrlf=true works for these tools fine, too: just store these files as is and everyone is happy!

You problem is not these nebulous tools, nope. Lots of people are in similar situation to yours and they deal with these limitations just fine.

You problem is decision to use autocrlf=true and the expectation that someone else would support it. Nope, it doesn't work like that and should not work like that.

If you want to change something concrete in some Rust tools to help you, then your CLs would reviewed and can be, even, applied. But if you POV “I have created problem for myself and now someone else have to fix it”, then no, it doesn't work like that. Someone else who, like any sane person, doesn't use autocrlf=true don't have your issues. And if you want to persist in the use of that Russian roulette mode then onus is on you to propose a fix.

@khimru thank you for sharing and passionately motivating your personal opinion.

If anyone knows what the rationale of the cargo team was to generate files with LF line endings on Windows, let me know.

On all platforms, the newline is the LINE FEED character (\n/U+000A) alone (no additional CARRIAGE RETURN (\r/U+000D)).

That is both interesting and relevant.

cargo new uses multiline strings to specify the generated file contents. Multiline string literals actually convert CRLF to LF.

Some of Cargo's lock file implementation directly pushes '\n' onto a string for serialization.

Rustfmt is less progressive as it applies the newline style of the first newline it finds. If any tool is responsible for addressing this it would be a formatting tool.

It's similar to formatting or style in a way, if a dev refuses to follow the conventions of the repo you stop checking their code in, not try to hide that from other developers.

And autocrlf can cause issues in the other way, Windows only files that require CRLF line endings shouldn't have them changed just because they're being viewed on a *NIX platform. Leaving the files untouched lets you use the line endings (or encoding, etc.) that you need for a specific files on a per file basis, no need to mess around with .gitattributes or whatever.

1 Like