I checked lots of Rust Github projects. Most of the project write everything in a single file. Let's say 2k ~ 5k LOC in a single file. Do we have standard on LOC per file?
I'm certainly not an authority of any kind. But I personally find about 1000 loc (including a lot of verbose RustDoc comments) roughly represents the amount of complexity I can hold in my head at once.
So when the whole crate is under 1000 loc, it's easier to follow for me when it's in a single file. But I start to get stressed out when I need to use search to navigate the file, etc. and then it's time to split the functionality into smaller modules.
Bottom line: I don't have a rule about loc, but it just happens that I start feeling the need to refactor when I pass about 1000 lines. It's not the file length that triggers it, but the feeling that too many things are happening at once.
I'm very much in the range of [10, 10²]. Though this is, as far as I can tell, not usual in Rust. It's more of a me thing in every PL, reaching 200 lines turns on internal mental lints in my head.
I found some common patterns of these repositories. It contains examples for document purpose plus the test cases. Do we have practice like putting tests on separated files. Since there are more complexity code will need to have more test cases to validate the quality. However, I found most of repositories put tests in the same file.
Unit tests are in the same file in a submodule traditionally called tests
. Integration tests are usually in a separate top-level directory called tests
. See Test Organization - The Rust Programming Language for more.
rustc
has a tidy check for 3000 lines (example issue).
Because static analysis seems to be superlinear in terms of runtime (which translates into more waiting for me while tools catch up, especially Emacs), personally I prefer to keep files no larger than around 1 KSLOC.
When a file grows larger than that, I split it up into submodules (unless I had already done that earlier for architectural reasons).
This is an interesting statement.
Shouldn't analysis times be proportional to the total crate size and not the size of any one file? Or does caching play some part here, where the only parts of the crate that are invalidated while editing are the items in the current file?
I never said they were proportional to the size of any one file. Just that the growth seems to be superlinear.
However, note that I also include macro expansions in that static analysis group, since AFAIK macros are fully expanded by the time typeck and borrowck run.
And a lot of my crates are macro-heavy.
I don't know that either way. But at this point it wouldn't surprise me if it did use extensive caching at that level.
One actual architectural reason IDEs might prefer smaller source files is that any time you type in a file, that has the potential to change the entire parse tree after that point. (In an extreme case, think adding an unmatched {
or }
.) In an an IDE, latency matters most of all, and the more expensive work you can skip (whether by incremental or otherwise) the better.
When a library is single-file or even just a flat hierarchy of modules (i.e. no folders on disk), I personally have a higher tolerance for larger source files than when a library is large enough to benefit from more nesting.
It's also important ime to distinguish between documentation and SLOC (Source LOC); (up to date) documentation actually lowers the perceived density/complexity of a file, as opposed to more SLOC.
Generally, though, I try to primarily structure modules as logical abstraction boundaries rather than just to split code into multiple files.
One thing that nobody has mentioned, so far, and that is a primary motivator for me to keep files small is managing concurrent edits where there are multiple developers.
In that environment I find that a large number of small files results in far less time spent handling merge conflicts than a small number of large files.
Yes there is work spent in decomposing and breaking up files, but I'd rather do that than handle merges.
The rust repo has a tidy check that things aren't over 3000 lines, though some files still bypass that check because some things are unsplittable, and documentation comments can make files very long even if there's not that much code in them.
If you're used to something like Java where the convention is often just one class per file, Rust will definitely seem longer. With the file being a module, splitting things up highly granularly introduces a bunch of unhelpful privacy boundaries, so "bunch of files each ~100 lines" is a poor choice in Rust.
I think there's also a big difference between single-file crates and multi-file crates in terms of acceptable file size. If it's all just one file, then 5k is fine, even if that'd be too big for an individual file if there were many files in the crate.
IDE friendly is not first thing to consider. I am considering following top 2 items to enable more developers to start using Rust from a team project prospective.
First, I agree with @roblally , there are lots benefits by breaking into a smaller file. It does well for code merge.
Second, readability, a long file is very difficult to read.
Personally, I also prefer separate the tests and example documentation to other file. e.g. abc.rs, abc.test.rs, abc.example.rs. My motivation to doing: as an engineer when I works on the logic, I want to focus on the most important and correlated code logic during implements first. But I have not see anyone doing in Rust.
On the contrary I find those privacy boundaries very helpful, to my tastes. I know exactly what those lines of code can access, only what they import and what's directly in their parent modules. It's one of the things I like the most in Rust, the fine control on the visibility of things.
I just bumped into a use case where I'm finding it advantageous to collapse a module that has a submodule (which currently houses the test suite @ ±1 KSLOC) back into 1 module.
The use case is that I'm writing a proc macro, and as part of the expansion I need to include the full source of a bunch of .rs
files. I could do it another way e.g. just copy-pasting the source code into the quote! { ... }
sections, but I find keeping the files separate more maintainable.
The thing is, when you have submodules, this whole read-a-file-and-inject-it-into-quote approach kind of breaks down, unless I want to replicate module resolution functionality by hand (spoiler: I have no intention of doing that), which would include modifying mod foo;
statements in parent modules in order to include the actual source code.
On the other hand, inlining all the modules means larger files (which I will deal with by not having to open them that often anymore), but also much-simplified inclusion code.
I’ve found that about 400-800 lines of non test code per file usually feels about optimal for my projects.
What makes that range optimal for you?
With smaller files, it often feels like there’s comparatively substantial overhead from navigating around to get the full picture; with larger modules, I find that navigation within the file becomes more cumbersome. I’ll admit that I’m still trying to level up in terms of my IDE usage for the latter, so maybe that’s just me.