I've recently discovered Rust and have decided to start my merry learning journey by going through the Book, and I have a question about a specific design choice in the language.
While reading part 3.3 about functions, I learned that unlike variable bindings, function definitions are visible in the entirety of the block in which they are defined, and so can be used earlier in the source code than where they are defined. By consulting the Reference, I learned that this kind of scope applies not only to functions but to what the Reference calls "items", of which function definitions are one.
// my_function can be called before its definition without issues
my_function();
fn my_function() {
// Body
}
My question is therefore, why are function definitions (and items more broadly) scoped like this?
Clearly there must be a reason, a benefit to reap, but it's not obvious to me what it is. I can't seem to find an explanation for it in the Book or the Reference, and I couldn't find any discussion about this (let alone an answer) on various forums, including this one. The only benefit that comes to my mind (at least in the context of defining functions) is making mutually recursive functions trivial to define, whereas in other languages it might be more difficult.
I suppose I'm asking this purely to satisfy my curiosity rather than to solve a real problem, but I just like to understand why things are the way they are.
Edit: Immediately after creating this topic, I found this topic in which the original poster asked essentially the same question, except it was prompted by const items rather than function definitions. Sadly, I did not find a satisfactory answer there, and the topic is already closed.
Let me pose the converse question: what is the value of forcing developers to order definitions by what other definitions they rely on?
As a compiler writer, that kind of constraint is useful: it lets me build up a database of symbols as I parse the program, and allows me to reject programs that mention symbols not in that database without having to parse the entire rest of the program to figure that out. That is in large part why C does it that way - the state of the art and the constraints of the system on which C was designed did not allow for whole-program symbol resolution within the knowledge and resources available at the time.
But most language users are not language implementors. Making the language easier for implementors to deal with can come at the cost of the experience of those users. Forcing users to declare symbols, such as functions, in a specific order for your own convenience may compromise their needs in the process. We now have the techniques - and the resources, such as memory and storage - to do symbol resolution on a whole-compilation-unit basis, rather than doing so in a single parsing pass.
When writing code for other people, for example, my first goal is clarity, which often means organizing code so that conceptually-related elements go together and higher-level elements go before lower-level details that the reader might be expected to put aside. That would be impossible if I had to fully define those lower-level pieces before using them. Rust - and a majority of other languages designed in the last 20 years - lets you organize your code along whatever scheme makes the most sense to you and your audience, rather than the scheme that makes the most sense to the compiler.
In a large program, functions are likely call each other in potentially very complex patterns. Even when those patterns do not actually involve mutual recursion, it is an inconvenient constraint to require the code to be organized in a way such that no function is used “before” its declaration.
More broadly, we find it useful for programs to be defined in a “timeless” fashion — the meaning of a definition is not dependent on what order the compiler processes it in. Source code can be organized according to modularity principles, not initialization-order requirements.
I've already read that part of the Book you linked to, and my issue is that while it explained that it's possible to use a function before it is defines, it doesn't explain the reason behind this design choice. I know that it's a feature of Rust, but I'm asking why it is.
Thank you for showing me this similar topic though; seems I had missed it! I think I understand the reason given in the answers to that topic, but then I'm immediately wondering why the same couldn't apply to variable bindings? Why are functions (or similarly structs, enums, or types) allowed to be used before their definition, but variables aren't?
Variable bindings store the results of specific computations that happen in a specific order, and may have side effects. Variables cannot use each other in a mutually recursive fashion.
More broadly: the bodies of functions contain statements that observably execute sequentially. Items, in contrast, do not execute.
Variables are frequently used for their value, which can change during the evaluation of a given program. Structs, functions, constants, and so on all resolve to things that are fixed during execution (and frequently fixed much earlier than that).
It would be entirely possible to write a language which allows
x = 5;
let x = 8;
in fact, Javascript (with var; most modern JS style guides recommend let for this exact reason) allows this. However, experience suggests that it's hard to reason about the behaviour of these programs. For example, in the snippet above, which value (5, 8, or other) would you expect x to have before the first line, or after the last line, and why?
The other thing that isn't necessarily as clear as it could be is that programs tend to be modified over time. Being able to freely reorganize functions and other symbols, without worrying about definition order, makes it much easier to make changes to an existing codebase.
Thank you all for your very helpful answers, they really cleared things up for me! They all had some useful insight, so much that I can't reasonably single one out as "the solution". Instead of replying to everyone individually, let me recap' here what I understood.
C was created at a time where technological limitations incurred constraints on the language's design choices, choices that subsequent inspired languages have inherited. But because we've overcome these limitations, we need not subject ourselves to these constraints any more, so why not break free of them? Especially if it can improve the clarity and flexibility of source code.
Design choices like this imply a trade-off between the comfort of the implementers of the language and the comfort of the users of the language, which must be taken into account.
Conceptually, variables are a "moving part" of the sequence of execution of a program, which takes place in a specific order, and so the place where they are defined in the source code is significant. On the other hand, items such as function definitions are "inert" in the sense that they do not execute anything. A function does indeed execute, but not its definition.
Multiple functions (or structs, types, etc.) might need to depend on each other in many different complex ways (mutual recursion being an example, but not the only one). Requiring functions to be declared before use imposes a rigidity that would make it harder to write such interdependent functions, with no clear benefit to the programmer.
The order in which things are written in source code is not a faithful reflection of the order in which things will happen during execution. Source code is a static, lexical, "timeless" thing which shouldn't always be read as a chronological sequence of instructions.
If you think that I misunderstood something, or have anything more to add, I'd be glad to hear it!
A very good summary! One thing that might be useful to expand on:
It's worth noting that in most “dynamic” languages, like JavaScript, Python, and Ruby, it is common that a function definition is a thing that executes, and its effect is “add this function to the namespace” — more broadly, that an entire source file is executed when it is loaded, with arbitrary side effects that are mostly “add this item to the namespace” but are not restricted to that.
The way these languages handle mutual recursion is that a function is allowed to be defined even when the definition’s body contains an unbound identifier — it's looked up when the function is actually called, not when it's defined.
In Rust (and in many other “static”/“compiled” languages), the contents of a source file (or a mod {} block) are not code to be executed — they are all items such as function declarations.
And to extend this further, the "a function body is only in memory while you're emitting it" restrictions that C and Pascal and such were designed under is only useful if you can actually take advantage of it in the implementation anyway. (Recall that in early C you didn't even need to keep the signature in memory, because you could just call functions with whatever, and the compiler didn't check.)
But modern languages are full of stuff that doesn't do that kind of "single pass one time" thing. We need to keep the bodies of generics around to monomorphize them when we find callers. We need to keep the bodies of const fns around to be able to use them to calculate consts, not just emit them to the target's machine code. We run coherence checks that need to look at all the trait implementations, rather than just at what we're translating right now.
So when from an implementation perspective the obvious way is just to load everything then process everything, because everyone running rustc has way more RAM than the size of a crate's source code, it's not worth the implementation complexity of streaming everything anyway, and thus giving the user some extra flexibility is easily worth it -- even makes certain things easier, since things like recursive functions don't need a separate "this is just a declaration not a definition" syntax that a helpful compiler would have to check against the real definition anyway.
And in particular compiling an entire crate "at once" avoids things like C's "one definition rule", where the simplest of mistakes can cause your whole program to be declared as undefined behavior.
Thank you for the precision! I don't have much experience with "dynamic" languages since I am more drawn to the more "static-y" flavour, so I'm learning something here. I don't think I'm knowledgeable enough in this kind of language to understand exactly what this means though, but I'll keep it in the back of my mind!
(Actually, Python was my first language, but I was too much of a novice at the time to be aware of any of this before I moved on from it).
Sadly, I am still too inexperienced in both C and Rust to fully appreciate the point that you are making. Especially in C; I am naturally somewhat familiar with it, but have never actually written in it. Hopefully someday I'll be able to go back to your reply, armed with new knowledge, and better grasp it!
It's also interesting to think about global variables in languages like these (which include Lisps and even some statically typed languages like ML). Their global vars are initialized in the order they're declared. They're sort of like local vars in that their initializers can use other vars that have been defined before them. And they can also call functions defined before them. So their initialization is done at runtime.
While Rust static (global) vars are very different. They are initialized at compile time and can only be initialized with constant expressions. Their initialization expressions can refer to other constants and can call const functions. Like functions, the order of static var declaration is irrelevant.
Within each module, each item must have its unique name (at least for its class, e.g. function), because it can be referred to form other places. Local variables, on the other hand, cannot be referred to from outside their block, so it is possible for local variables to have the same name, even if they are in the same block
The idea to define items only before they are referenced is already quite restrictive if your project only has a single file, but it becomes even worse when multiple files are involved.
The syntax of modern languages is usually such that the code can be parsed even if it refers to things whose definition is not known. This implies some rigidness, for example operator precedence and associativity cannot be changed by their definition. I think the endurance of the turbofish is also the result of this.
Which is useful for things like how rustfmt can format individual files without the full crate tree.
And differs from things like C, where for example foo*(bar); might be a variable declaration of pointer type (foo *bar;https://c.godbolt.org/z/qKsWzGhY9) or might be a multiplication (foo * bar;https://c.godbolt.org/z/o8nWEdhcb) depending what's in-scope at the time.
(C's over half a century old, they didn't know better at the time. But there's a reason new languages -- well, good ones at least -- don't use that local variable syntax any more.)
As someone how wrote copious amounts of C code very early in his career, every part of what @derspiny wrote is correct.
You want code to be readable top to bottom and that means you start with high level functionality first which then uses low level functions which follow up further down.