On code readability

on code maintenance and readability, with C/C++ i place all variable declarations on top (of functions/procedures) but with Rust, i see "let " declarations sprinkled all over the place.

don't you find it hard to maintain?

The modern accepted convention (in a lot of languages, not just Rust) is that broadly speaking, variables should have small scopes, and be declared close to where they are used.

Declaring variables at the top of the function means that when reading the function from top to bottom, you encounter a lot of variable names with no context for what they mean (other than the names themselves), since they're all grouped together without any related code.

Also, (in many languages) it means that the variables exist for a while being declared but not initialized (or initialized to a placeholder value), which can lead to bugs if the initialization is accidentally reordered below its use. In Rust, this isn't a problem, because the compiler tracks whether variables are initialized and will reject let x; f(x); as using an uninitialized variable, but it's still not idiomatic.

25 Likes

For example this is the Google C++ style guide. I'm not saying it's the right rules just because Google, but it's good enough to maintain many Google-scale projects.

https://google.github.io/styleguide/cppguide.html#Local_Variables

1 Like

Personally, I find it much easier to maintain if every variable is initialized on declaration (or, to say another way round - declared on initialization), so that I don't risk trying to use it too early.

10 Likes

There's a number of patterns that benefit from line-of-first-use declarations.

One is shadowing. It's not uncommon to have a sequence of transformations of an object that differ in type. Often this is at the top...

fn foo<P: AsRef<Path>>(path: P) {
    let path = path.as_ref();
    // ...
}

But sometimes happens elsewhere.

    let mut inner = args_os();
    if let Some(command) = inner.next() { /* ... */ }

    // Tokenize the rest
    let inner = inner.map(|arg| arg.into_vec());
    let inner = TokenStream::new(inner, true);
    let inner = OptionConsumer { name, inner };
    let mut tokens: Vec<_> = inner.collect();

The different inner have different types, so you can't predeclare them all before initialization.


Another is the scoped ownership model. It generally makes sense to have block-local variables, as they'll drop at the end of the inner block. Not only is there no need for the function outside the block to be concerned with what's only required inside the block, it can sometimes reflect that a resource (e.g. Mutex guard) should drop before the rest of the function.

Another is the aversion to declaring variables mutable if they don't need to be. If you declare a variable assigned (/initialized) in a loop outside the loop, you'll need to make it mutable so you can re-assign (/reinitialize) the variable each time through the loop, even if it has no other reason to be mutable.

So taken together, I could rewrite this:

    for child in &node.children {
        let cdata = data.entry(child.clone()).or_default();

To this:

    let mut cdata; // `fn` top is a couple scopes above and far away
    // ...
    for child in &node.children {
        cdata = data.entry(child.clone()).or_default();

but now my declaration is far away from the use, isn't initiated for some amount of time, has to be mutable even though it doesn't need to be once bound/reassigned, might accidentally get shadowed by something else when the function is edited, won't actually be useful after the loop if it happens to get invalidated in the loop, might well cause borrow problems if still valid and you tried to use it after the loop, ...


Conditional bindings are also built into the language, making the above pattern more natural; one might say, encouraged.

    for node in nodes {
        if let Some(category) = categories.get_mut(&node.value) {
            // ...
        } else if let Some(item) = items.remove(&node.value) {
            let idata = data.remove(&node.value).unwrap_or_default();

Even though they don't contain the word let, you'll find yourself creating new bindings (variables) in match arms all the time.

match thing {
    E::Left(foo)  => { /* foo is a new binding for this arm */ }
    E::Right(bar) => { /* bar is a new binding for this arm */ }

Borrows also tend to be temporary, so if you do something like...

let elem = &v[idx];
do_stuff(elem);
v.push(other_things)

elem is invalid after the v.push anyway, so why make yourself think about it for the entirety of the function. You have to be somewhat used to the borrowing model to "see" that elem becomes invalid there, but this pattern also happens with owned items, usually in a more obvious way:

        let iter = candidates
            .iter()
            .map(|item| &items[item])
            .filter(|&item| item.filter.contains_key(&self.name));

        for item in iter {
        // This ^ consumes `iter`, so I created something and immediately
        // made the binding invalid by using it.  I don't have to think
        // of `iter` outside this immediate area of code.  Also I'm
        // borrowing so many things here I probably could not have
        // initialized at the top of the function and would have this:
        // ```
        // // no context w/o a comment describing s.t. far away, which
        // // I don't actually need to know except in that short area of
        // // code, which is by the self-descriptive initialization
        // let iter; 
        // // lots of code 
        // iter = actual_code_that_shows_what_iter_is_for();
        // ```

For the invalidated reference case, maybe in C you would reuse pointer variables instead. In Rust you would have to declare the binding as mutable and reassign it. You'll still be protected from invalid reference uses (in safe code), but it's more idiomatic to just declare a new non-mut binding if you need one.


Like that iter declaration shows, functional-language-like features can lead to doing a lot of work in your declarations. So you're not necessarily separating out the "code that does actual work" from the list of all your variables by having every let at the top anyway. The inner shadowing example earlier comes from a function that has

  • 8 let or let mut statements, some of which are method chains
  • 2 if let Some(...) blocks
    • 1 non-declaration LOC (reassigning a value) nested within
  • 2 other non-declaration LOCs at the very end

So almost everything the function does is part of the declarations, even though they're all pretty much at the top! There is no actual "read up on every variable used in here before reading the code itself" area at the top of the function; there's pretty much just the body of the function that happens to be mostly initializing declarations.

It's true I could split apart the declarations and initializations...

let mut inner_1;
let inner_2;
let inner_3;
let inner_4;
// etc

...and put all the work-doing initializations after that, but aside from doubling the size of the function (or more if I commented all the shadow-avoiding variables), it's unidiomatic and IMO obscures more than it illuminates.


When you create parameter-taking closures, you're also dealing with locally created bindings in a sense.

let lambda = |cannot_predeclare_me| { /* ... */ };
18 Likes

That's basically a cargo-cult style guideline that comes from C89, which required variables to be declared at the top of each block, because… who knows why, honestly. Perhaps it made ancient dumb C compilers easier to implement (it could allow the calculation of stack space upfront, without doing a separate pass over the code to collect variables). I have read some people recommended it for ensuring every variable is initialized.

Thus it is a technical detail, and it makes absolutely no sense to enforce it in modern days, especially in languages like Rust, for which the compiler performs data flow analysis for mechanically ensuring initialization, and a myriad of other multi-stage analyses and optimizations for transforming it into correct and efficient code.

It doesn't make code easier to read or maintain, quite on the contrary:

  • When variables have minimal scope, the code is easier to understand because the reader doesn't have to guess which one or two of the dozen or so declarations are currently relevant. Having to keep all the declarations in one's head would overwhelm the reader with useless information. (Most people's short-term memory is limited to around 5 to 7 items.)
  • When changes to the code are required (and they will eventually be required), minimizing scopes has the added benefit of not changing anything accidentally. When all variables are declared at the beginning of the function, you risk reusing them involuntarily. If you mutate a mutable binding, then every subsequent use will be affected, which quickly leads to convoluted code and bugs.
13 Likes

I don't miss it. Rust is more of an expression-oriented language, so it often avoids using variables in the first place (e.g. instead of a ternary operator, if can be used anywhere). In idiomatic Rust most variables are immutable, so they aren't that important for the logic of the program. They're mostly just a minor tool to hold temporary data or split long lines.

Rust has patterns, which is another way to create variables (bindings) in match or if let, and you wouldn't be able to move them to the top of the function.

4 Likes

Very early languages had no type declarations at all, for example, in some Basic dialects, everything was a float, unless the name ended with a percent, in which case it was an integer, and so on:

x = 1.0
y% = 1

Then came languages like Pascal, which had an extra block to declare variables:

VAR
  x: real;
  y: integer;
BEGIN
  x = 1.0;
  y = 1;
END

Then came C, and you didn't need an extra block for declarations:

{
  float x;
  int y;
  x = 1.0;
  y = 1;
}

Eventually, some one realized that you can do declaration and assignment in one step:

{
  float x = 1.0;
  int y = 1;
}

Note how languages like Scala and Rust borrowed, uhm, I mean cloned, Pascal's syntax for ascribing types, i.e. with a colon:

{
  let x: f32 = 1.0;
  let y: i32 = 1;
}

Also, in Scala and Rust, keeping declaration and initialization together is now mandatory, which means most often that we can't initialize until later in the code (edit: that's actually not true, as @Michael-F-Bryan points out).

3 Likes

Just to mess with you, this is valid Rust which declares x well before it is initialized :wink:

fn main() {
    let x: String;

    let y: &str = if condition() {
        x = "Hello, World".to_string();
        x.as_str()
    } else {
        ""
    };
}

(playground)

Normally I would reject code that doesn't initialize a variable immediately because it's a maintenance hazard, but there's always that 1 time where doing things in a funny way can save an allocation.

4 Likes

I think the most common place I run into this is when I might want stderr or stdout. [1] Before the locked handles became owned, you would actually need to keep Stdout or Stderr around as a temporary as well, so four variables instead of two.

More generally, sometimes you need references only initialized and borrowed within an inner scope to stay alive beyond that scope.


  1. Common as it's a typical CLI exit pattern to distinguish between "you asked for help" and "you asked for something I don't understand". ↩︎

2 Likes

Oh, I thought that was illegal. Thanks for pointing out!

How many lines are you functions ?

Generally for me, most functions either:
(1) fit vertically in a single screen or
(2) have a giant match statement where each arm fit in a single screen

Either way, it's possible to look at one screen and see everything important without scrolling.

1 Like

I remember moving from dialects of C that required declarations at the top to dialects that didn't, and it was a huge improvement. The code ends up much nicer and easier to read with the declarations sprinkled throughout.

1 Like