Idioms regarding main.rs

Quite often, I hear this advice: "limit your main.rs to argument parsing and calling a function from another module/library".

Why? I see no reason to keep the file so small, and indeed, when followed strictly I can imagine harm being done (similar to that done when OOP programmers make too many trivial classes).

I usually handle my top level logic in my main fn, delegating to other modules or functions to make it more readable. But that's not what others seem to recommend.

Is there a reason behind this I simply don't know? Am I misunderstanding or exaggerating their point? Since it's fairly ubiquitous, I'm inclined to believe I'm missing something.

1 Like

People have all kinds of ideas and theories about how to lay out code. Often they sound really good when someone makes the argument, but in practice it doesn't really matter.

One of my favorite examples of this is placement of private vs public class members in C++. When I learned C++, and for decades after, private members were always placed at the top of the class. There wasn't really a reason given, as far as I could tell, everyone just did it that way. Many years later someone argued that public members should be at the top, because people are interested in the public members and not the private members, so the private members should be at the bottom because no one cares about them. So down they went. However, another thing had happened: Automatic code document generation. People generated their (public) documentation and either created a webpage from it or some IDE integration. And for me personally, I noticed that the times I needed public documentation I used the generated documentation, and whenever I wanted to know something about the private implementation, I went to the source/header files, but this had gotten more annoying because the things I was looking for was at the bottom of the file. So I put them at the top again to make my life easier.

Personally I like to keep main somewhat small and limited, but I can honestly not say why. I think I want main to act as an "outline" of how the application works at a very high level -- someone looking at main shouldn't get bogged down with too many details.

Another aspect to your question is whether something is idiomatic within an ecosystem/community. Sometimes there isn't really a reason for something other than "it's how we do things", and I say that not as a bad thing. As someone who can get hung up on unnecessary details for way too long, for me it's nice to be able to simply fall back on "I don't need to figure this out, I can just do as everyone else does", unless I encounter a compelling reason to break with convention.

All this is to say; do what you think feels best, but if you notice that 99.9% of other projects do something a particular way, then perhaps there is a good reason for it, even if it's not immediately apparent why.

11 Likes

I appreciate the very thoughtful reply!

I was pretty unclear when making the post, sorry. I agree with this, but specifically, I find people who go to extremes and say it should be restricted even further, such that even the core logic is split into its own file.

It's to make your code reusable as a library by others. It doesn't help the executable maintainers directly, it helps the community.

An example in the wild: There is an awesome project called customasm that I have used for several reasons. Mostly doing weird DSL things with uncommon CPU architectures.

The CLI provided is kind of good for most uses, but I have had some very specific requirements that it just cannot reasonably address. The good news is that main.rs is [currently] only 26 lines. I can completely ignore that and use the library in any way I need for my bizarre use case. It's a real pleasure that I don't have to fork this and many other projects just to use them as dependencies.

Do it for the community.

8 Likes

Sure, but I don't think that putting your top level logic in main hurts that at all. The only use case it hurts is when somebody wants to run your program, but not as a program... which is possible, but also sounds quite easy to work around.

Depends on what you mean by top level logic, I suppose. If your application is a web server or GUI, it might not have any functionality that others want to depend on.

I've never heard this as general advice that I can recall, but I have often heard the suggestion that your main() function should deal only with argument parsing and top-level error reporting. This has the benefits that:

  • tests can be written against code that doesn't read the process arguments but take ordinary function parameters
  • errors that were propagated with ? can be printed nicely, and the exit code set, in appropriate application-specific ways, with a match against the Result from the function main() calls (whereas returning a Result from main won't produce a well-formatted result by default)

Then if you have such a function separation, you might end up deciding to put them in separate modules too. Or not.


But besides that, a couple of reasons you might want to keep your main.rs — or your lib.rs or mod.rs, equally so — free of nontrivial code are:

  • The files at the top of the module hierarchy often contain a lot of mod and #![] and other such stuff; it can be nice to keep all of that largely separate from the functions for readability, to focus on one aspect of the program at a time.

  • Sometimes you care about what items are in the namespace of a particular module, in which case you want to keep it free of the miscellaneous uses that usually accompany writing function code. In particular, consider a module structure like this (but written in separate files):

    mod foo {
        use some_other_lib::Thing;
        pub fn do_foo() {
            let thing = Thing::new();
        }
        mod bar {
            use crate::foo::*;
        }
    }
    

    Then because bar happens to be inside foo, crate::foo::* matches Thing, even though the fact that it's imported is intended as an implementation detail for do_foo(). (You could reasonably say “don't use * imports”, but it doesn't have to be that; the same thing affects, say, IDE completion of paths. One way or the other, you could accidentally import crate::foo::Thing.)

    Obviously trying to avoid this could be taken to absurd lengths, but I think it's reasonable to decide to follow a principle like “everything in the crate root is one of my own items, nothing imported from elsewhere”.

16 Likes

Ah, some excellent points. I don't do much testing, so I don't often think about making my code easier to test (I usually just focus on making my types as general as possible). Thank you for the detailed explanation!

Moving logic from main.rs to the 'lib' side allows us to attach doc comments and build documentation around the feature, making it easier to onboard new team members. Even the Cli struct that we use to process arguments with clap gets documentation, above and beyond passing the --help flag. That is probably a bit excessive, but overall the documentation experience is geared towards consuming libs, and we want to leverage that.

3 Likes

Note that you can get all of the docs (even for bins) with cargo doc --document-private-items --no-deps.

The only real advantage to moving code to a library is that you can treat it as a library. I.e., add it as a dependency to another project.

2 Likes

Public members should go first; all decent people agree on this :slight_smile:

It's been an age, but I have this vague recollection that for a while, C++ compilers couldn't handle a call to a function that hadn't been declared yet, even if it was declared later on in the same file. So if an inline function A called function B, B had to be declared first. Since public functions often called private functions, and rarely vice-versa, I think that was the main reason for the "private-first" school of thought

1 Like

I appreciate this feature, but documentation gated in this way is less discoverable. This also detracts from our goal of having a single accessible online source to direct new staff.

Keep in mind that our management was initially skeptical that online documentation would be an improvement at all. Their expectations were more along the lines of "bulleted instructions with screenshots" in a bunch of Word docs. Moving towards embracing best practices in the industry is an ongoing process.

The fact that they do not know how to clone a repo or run cargo docs makes them feel like the documentation is being hidden from them as effectively as if I had encrytped it.

2 Likes

C and therefore C++ do require forward declarations as part of the language specification.

I'm not sure it serves too much of a purpose nowadays, it's entirely possible it's creating more work for the implementation to emit the required diagnostics than it's saving them by not having to delay resolution would, but at least in C++ that lookup can have different results depending on where the declaration was:

namespace foo {

  void f();

  namespace bar {

    void g(){ f(); } // foo::f, not foo::bar::f

    void f();

  }

}

So it's a back compatibility issue.

You may be interested in the "c++ syntax 2" proposal, which attempts to address many issues like this: GitHub - hsutter/cppfront: A personal experimental C++ Syntax 2 -> Syntax 1 compiler

1 Like

I agree, blindly following such "rules" is not necessary and can be unproductive.

Typically though programs are big enough that they have a number of distinct components. It's useful to give those components useful names, be it by creating just structs for small locally used things to modules for bigger things to their own files. At this point naming things becomes important, strictly names, module names, file names. It helps organise and find things. It helps to describe the architecture of your solution to others. So for example one may have a program with parts that implement a model, a view, and a controller. It would likely help keep things orderly by putting those parts into files called model.rs, view.rs and controller.rs.

Now with that in mind, what then belongs in a file called main.rs? "main" does not really mean anything, "main" says nothing about any useful part of your program. Its only use is in naming the starting point of the program.

So as you noted, main is where we can gather command line arguments, perhaps install signal handlers, and other such mechanics that are essential to get the program started. Apart from that the useful "logic" of your program deserves its own names and files.

But as you say, for small enough, simple enough programs forcing things out into separate files may be more work than it is worth.

It's your call.

1 Like

In addition to this, pub code isn't marked as "unused" so you can develop functionality in a library and not be overwhelmed by spurious warnings before integrating it into an application.

2 Likes

Servo, uses, I believe about three main()'s before you get to program logic.

1 Like