Baffled yet again

I recently posted an account of a battle I had with the Rust compiler, which got explained by some helpful folks who understand the language and the compiler better than I do. At the time, I said that I was going to take a hiatus from Rust, hoping for improvement in both the compiler diagnostics and the documentation.

Well, I had some time, unexpectedly, and couldn't resist trying to continue solving the problems I was having with the program I was working on. I made some progress until I ran into the following:

   // Prepare to build the account tree
 let marketable_asset_value_stmt = db.prepare(constants::MARKETABLE_ASSET_VALUE_SQL).unwrap();
    let non_marketable_asset_and_liability_value_stmt = db.prepare(constants::NON_MARKETABLE_ASSET_AND_LIABILITY_VALUE_SQL).unwrap();
    let income_and_expenses_value_stmt = db.prepare(constants::INCOME_AND_EXPENSES_VALUE_SQL).unwrap();
    let account_children_stmt = db.prepare(constants::ACCOUNT_CHILDREN_SQL).unwrap();
    {
        let mut account_value_statements = AccountValueStatements {
360               marketable_asset_value_stmt:&mut marketable_asset_value_stmt,
            non_marketable_asset_and_liability_value_stmt:&mut non_marketable_asset_and_liability_value_stmt,
            income_and_expenses_value_stmt:& mut income_and_expenses_value_stmt,
            account_children_stmt:& mut account_children_stmt
        };
        //build_account_tree(&mut root, &mut account_value_statements, julian_begin_date_time, julian_end_date_time);
    }
}

I've added line 360 to the code, so you will understand the compiler messages. Also note that the last close-brace terminates the main program.

Thisresults in the following error (one of several) with version 1.20 on an Arch Linux system:

error[E0597]: `marketable_asset_value_stmt` does not live long enough
   --> src/main.rs:367:1
    |
360 |             marketable_asset_value_stmt:&mut marketable_asset_value_stmt,
    |                                              --------------------------- borrow occurs here
...
367 | }
    | ^ `marketable_asset_value_stmt` dropped here while still borrowed
    |
    = note: values in a scope are dropped in the opposite order they are created

The compiler is complaining the a reference (within the struct) to a statement outlives the statement. But the statement is bound in the outer scope and the struct is bound in the inner scope and therefore the struct should be deallocated before the statement and thus its reference. Either I've got this completely wrong or this is a compiler bug or .... Please explain. (Note that I've commented out the call to build_account_tree, which references the struct, in an effort to simplify things.)

Can you paste the definition of the AccountValueStatements struct? It probably uses the same lifetime parameter for all the mutable references, which wouldn't work.

1 Like

Yes, that's exactly right.

struct AccountValueStatements<'l> {
        marketable_asset_value_stmt:&'l mut Statement<'l>,
        non_marketable_asset_and_liability_value_stmt:&'l mut Statement<'l>,
        income_and_expenses_value_stmt:&'l mut Statement<'l>,
        account_children_stmt:&'l mut Statement<'l>
}

But if you omit the lifetimes on the Statements, it complains that it wants one. Use a different lifetime

struct AccountValueStatements<'l, 'm> {
        marketable_asset_value_stmt:&'l mut Statement<'m>,
        non_marketable_asset_and_liability_value_stmt:&'l mut Statement<'m>,
        income_and_expenses_value_stmt:&'l mut Statement<'m>,
        account_children_stmt:&'l mut Statement<'m>
}

and you get

error[E0491]: in type `&'l mut sqlite::Statement<'m>`, reference has a longer lifetime than the data it references
  --> src/main.rs:56:9
   |
56 |         marketable_asset_value_stmt:&'l mut Statement<'m>,
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: the pointer is valid for the lifetime 'l as defined on the struct at 55:1
  --> src/main.rs:55:1
   |
55 | / struct AccountValueStatements<'l, 'm> {
56 | |         marketable_asset_value_stmt:&'l mut Statement<'m>,
57 | |         non_marketable_asset_and_liability_value_stmt:&'l mut Statement<'m>,
58 | |         income_and_expenses_value_stmt:&'l mut Statement<'m>,
59 | |         account_children_stmt:&'l mut Statement<'m>
60 | | }
   | |_^
note: but the referenced data is only valid for the lifetime 'm as defined on the struct at 55:1
  --> src/main.rs:55:1
   |
55 | / struct AccountValueStatements<'l, 'm> {
56 | |         marketable_asset_value_stmt:&'l mut Statement<'m>,
57 | |         non_marketable_asset_and_liability_value_stmt:&'l mut Statement<'m>,
58 | |         income_and_expenses_value_stmt:&'l mut Statement<'m>,
59 | |         account_children_stmt:&'l mut Statement<'m>
60 | | }
   | |_^

error: aborting due to previous error

error: Could not compile `newcash_report_generator`.

This all relates, at least for me, to the fact that neither Book 1 nor Book 2 comes close to adequately explaining the semantics of lifetimes re structs and references within them. This all has the effect of driving me back to C. But please, enlighten me as to how to fix this, because the references do NOT outlive what they refer to. But I'm damned if I can figure out how to say it in the code and the documentation, at least what I've read (both Books) are of no help whatsoever.

1 Like

try this:

struct AccountValueStatements<'l, 'm: 'l> { /*...*/ }

Yes, these cases aren't all that straightforward. The wrench in this whole thing are mutable references.

The thing the compiler is actually complaining about, AFAICT, are the Statement values themselves because they have a lifetime parameter of their own. Because mutable references make the lifetime invariant (i.e. you cannot substitute a longer lifetime for a shorter one), and since the Statement values in your example do have different lifetimes, the compiler rejects the code.

The way to fix your `AccountValueStatements' is to define it as such:

struct AccountValueStatements<'a, 'b, 'c, 'd> {
        marketable_asset_value_stmt:&'a mut Statement<'a>,
        non_marketable_asset_and_liability_value_stmt:&'b mut Statement<'b>,
        income_and_expenses_value_stmt:&'c mut Statement<'c>,
        account_children_stmt:&'d mut Statement<'d>
}

This gives each Statement (and the mutable reference to it) its own lifetime parameter, and thus allows them to vary with respect to each other.

Now, here's another aspect to consider. Suppose you have a struct with mutable references to types without their own lifetimes:

struct Foo<'a> {
   x: &'a mut i32,
   y: &'a mut i32
}

In this case, the compiler will allow you to pass references that have different lifetimes:

let mut x = 1;
let mut y = 2;
let f = Foo {x: &mut x, y: &mut y };

That's because the reference lifetimes themselves get shrunk down/"squeezed". Since the type itself (i.e. i32 here) has no lifetime parameters, there's no "squeezing" of its lifetimes.

Let me know if this doesn't help.

2 Likes

First I want to thank both of you for trying to help. I really appreciate the helpfulness of the Rust community. And that includes Steve Klabnik, with whom I've had some disagreements. But whether I agree with his technical or documentation viewpoints, he has always tried to be helpful.

As for this situation, I'm now doubly baffled as to why anyone would choose to use Rust, given what it is requiring here. Why does it need a lifetime parameter on the Statements within the references in the struct? I've deliberately defined the bindings of the statements in the scope of the main function and created the instance of the struct in an inner scope. The lifetime of the Statements are right in front of the compiler's nose and it is clear that the references, being part of the struct in the inner scope, created deliberately for this purpose, do not outlive them.

And the fixes that you are both proposing, both of which may work, go well beyond anything I have seen in any documentation of this language. If I'm wrong about this, please cite something that explains this, besides reading the compiler code. I do acknowledge that you both seem to understand this, but I'm at a loss as to how you acquired that knowledge through means that can reasonably expected of an ordinary user.

This is my main point, going back to my assertion months ago that I can't see the cost-benefit proposition of using Rust as a general purpose programming language, because it's both difficult and under-documented. Perhaps it already makes sense in special situations where the lack of GC pauses are important. But for a writing a financial report generator that will run on 3 Ghz, 8 GB PCs running Linux and having to put up with this pain when I can get the same type- and memory-safety from Haskell? I just can't see it.

I've tried repeatedly to prove to myself that I'm wrong about this, because Rust is clearly a major effort by some very smart people. But again and again, I keep having experiences with it that reaffirm my original conclusion.

2 Likes

Why does it need a lifetime parameter on the Statements within the references in the struct?

The compiler needs lifetime parameters where references are involved to make sure that you're not assigning things with a shorter lifetime to types with a longer lifetime. Just because your code doesn't need to care about these things doesn't mean there's no way it could go wrong. By using only one lifetime, you're actually restricting the amount of possible code that could be written using that struct. The easiest way to get going if you don't need things to be the same is to assign everything a different lifetime parameter and then add lifetime bounds as needed.

please cite something that explains this

https://doc.rust-lang.org/book/second-edition/ch19-02-advanced-lifetimes.html#lifetime-subtyping

Also the reference for the error message you got the second time around explains exactly what you need to do: rustc --explain E0491.

having to put up with this pain

Lots of references can be a pain yes. Maybe you'd benefit from a more managed approach if you don't actually need references. From what I can tell, your code doesn't actually require mutable references to the individual Statements, you could easily pass Statements themselves, or perhaps a mutable reference to an AccountValueStatements consisting of owned Statements instead. Alternatively, you could use Rc/RefCell to move the lifetime checking from the compiler to the runtime.

4 Likes

Mutable references must prevent a longer lived reference from pointing at a shorter lived reference (to avoid dangling references). When you use immutable references, they have more flexibility because longer lived references can be substituted for shorter lived ones (there's no harm because you can't assign them in a way that would lead to dangling refs).

These mechanics are expressed through variance (subtyping) rules. Variance/subtyping in Rust is purely about lifetimes - can Statement<'long> be used in places where Statement<'short> is expected? That's what it tries to answer (and enforce).

I agree that multiple mutable references to values that themselves have references is a bit hairy. It takes some internalizing and getting used to. But it helps in understanding what the compiler is trying to prevent, statically. To determine things statically, it has to look at type definitions. Your AccountValueStatements is defined a certain way, and that way doesn't match what you're actually doing when using the type. Then you can switch to the formulation I gave, which better describes the situation to the compiler.

The docs probably don't spell out this particular situation. They can't illuminate all possible cases, of course. That said, some of the hairier cases could be shown and explained. Room for improvement and all.

To the rest of your post, whether you feel the learning curve is worth climbing or not is a personal call. I will say that you happened to run into a situation where understanding variance/subtyping is required, and it spans multiple references, and is not intuitive in the beginning. I would, however, recommend trying to understand what's going on here and why a given solution works. Not memorize it, but understand it. I'd do that before getting distraught with the whole thing.

Also, I think the Rust team would welcome suggestions on how to improve docs. I know they're working on an "advanced" section, but as mentioned, there's room for improvement - I'm sure nobody will disagree.

Finally, forums like this, reddit, IRC channels, and so on can be used as "live"/"extra" documentation. As you say the community is pretty welcoming of questions and lots of people take a lot of time to provide thorough and insightful answers. We should see if those can be incorporated into the docs so they're not lost. But those resources are available for people trying to learn and understand the hairier parts.

P.S. Speaking for myself, I don't always know whether a bit of code will compile without trying and seeing what errors the compiler points out. Of course one has to know about the various topics to understand what the compiler is complaining about, but don't think that people sleep walk through the more complicated scenarios (maybe some do, but likely not many).

5 Likes

Also, not all Rust code you'll write will be dealing with references, let alone mutable ones all packed into a single struct pointing at another type with its own lifetime parameter :slight_smile:. So it's unfortunate that you're running into this but I wouldn't expect this to be a constant problem or even a scenario that will make you pause.

2 Likes

Do you have to use references in the struct?

If you just want to place things in struct by pointer rather than value, then borrowing is the wrong way to do this. Usually structs are used with pointers like Box/Rc/RefCell, which make borrow checker happier.

2 Likes

As I said before about this community, all very helpful responses. Since you made the effort to explain this issue to me, I will make the effort to understand your messages.

I actually did try letting the struct own the statements rather than borrowing them and it led to a different problem that I can't reproduce at the moment. That problem was likely my doing, because trying it now does work. But the borrowing approach should also work and has the advantage of reference by name rather than value, though that may not really matter in practice. And again, it seems to me that by using the scopes I did, I made the lifetime relationships clear between the statements and the references to them. I'm understanding now that my attempt to deal with the compiler's insistence on explicit lifetimes on the Statements within the references in the struct definition is what led to the problem, because what I said was not true (that all the Statements had the same lifetime, which is not the case).

One thing that I know caused trouble for me here was the name of the relevant section in Book 2 -- Lifetime Subtyping. This doesn't begin, at least for me, to describe the issue that that section is dealing with. Something like "Specifying Relationships Among Mulitple Lifetimes" seems closer to the mark, based on my superficial understanding of what it says and my reading of all your messages. I did read part of that section, but didn't work hard enough for it to sink in (and it is hard work) because it wasn't clear to me that it applied to anything I was doing. That was wrong, of course, but a clearer title and perhaps some summary material up front (perhaps a version of the paragraph that occurs near the end of that section that begins with "Which gets us to the point of this section: ...") that would could help avoid others from going astray as I did.

4 Likes

The nomicon has a good section on this: Subtyping and Variance

It talks about why &'a mut T is variant over 'a but invariant over T. That's the distinction I tried to demonstrate with the Foo<'a> struct a few posts up. It also should (hopefully) clarify why Statement<'a> becomes invariant when referenced mutably. If you look at that overwrite function example in the nomicon section and pretend you're calling the function with &mut Statement<'static> and &mut Statement<'a>, you should be able to see how that would be problematic for the same reason the str example is.

Now, instead of that overwrite function, go back to your AccountValueStatements struct in its original form with a single lifetime parameter applied to all 4 references. Given we know the T in &'a mut T is invariant, we cannot substitute a longer lifetime for a shorter one (i.e. the overwrite problem) for T. T is the Statement<'_> here. So we know Rust won't allow this. This in turn means all 4 Statement<'a> values in AccountValueStatements must have the same lifetime. But that's not true when we actually try to create this struct in that code and the compiler complains.

When you make all 4 references use a different lifetime parameter, it breaks this requirement - it's broken because you're explicitly telling the compiler that these 4 types are completely unrelated. This may seem like a completely arbitrary and mystifying fix if you don't know about this subtyping business. But hopefully now it makes (some) sense.

P.S. I do think this type of situation would be good to describe and analyze in a "putting it all together"-like section, whether the book or the nomicon. @steveklabnik thoughts? I think the book does a good job explaining the Parser/Context example and why you need a 'a, 'b: 'a setup rather than just Parser<'a>/Context<'a>. But an example using mutable references to types with lifetime parameters, like the AccountValueStatements one, would explain another important aspect of subtyping. One can draw conclusions from the nomicon, but an explicit and detailed explanation for beginners would be nice - they may not have the comfort level yet to draw conclusions from generalizations; spelling it out with examples would help.

2 Likes

I read the nomicon section and I understand the issue.

I think a problem here is that there's an awful lot of computer-science-y
nomenclature being used to describe a relatively simple problem, at least
in the Nomicon. I think this is avoided in Book 2, but again, I think the
section in Book 2 needs an explanation of what it's about, what problem it
helps you solve, and it needs this at the outset, starting with a more
descriptive name for the section and then an "executive summary" at the
beginning that tells you what you are about to learn. That would help
people who need to solve that problem to realize that they are in the right
place and will be encouraged to wade through all the subsequent complexity.
I still haven't read that section carefully, but I will. It may turn out
that it's explained as well as it can be, but perhaps I can offer some
suggestions that would make it less difficult.

I'm beginning to realize that there are some areas in Rust that need to be
avoided if an alternative exists, because they will lead to a painful
wrestling match with the compiler.

One is references in structs, which will shorten your lifetime :slight_smile: Sorry.

The other is closures. In one routine in the program that started all this,
I used a couple of closures within that function to move the details of
what they do out of the main flow of the code, making the main part easier
to read. I used closures defined within the scope of the function because
that's the largest scope the name of the closures should have. And I used
closures, not functions, because they need to reference variables in the
outer scope and the use of closures avoids having to pass that stuff in as
arguments. Scheme programmers, of which I am one, do this sort of thing all
the time with no trouble. The problem in Rust occurs if two such closures
want to mutate the same item. The compiler complains about the second
closure, asserting that the first one requires exclusive access to the
object in question. It's as if it thinks the two closures could be called
simultaneously -- worrying about multiple simultaneous writers -- which I
suppose could happen in multi-threaded code, which mine is not. In my case,
the better approach was to give up on the readability and place the code
from the closures in-line. I think turning the closures into functions
would also have worked, at the expense of explicitly passing the free
variables. Or perhaps I could have used macros to hide the details but
still inline the code.

Ok. I will continue to slog through this program, having been educated by
all of you and by experience. I'm still dubious about the cost-benefit
proposition, but before making a final decision about throwing up my hands,
I think I should complete this exercise. It may be that there's a subset of
Rust that is useful to me and avoids getting tangled up in
memory-management arcana.

3 Likes

This might be due to lexical lifetimes, although hard to say for sure without seeing the exact code. That's a known issue in today's Rust, and something that's being worked on.

Note that it's technically a good thing the compiler is checking - this prevents things like invalidating iterators statically. Unfortunately it's lexical right now and causes issues in places that should work.

2 Likes

If you would like to know more about why Rust considers shared mutability harmful even in a single-threaded context, you may find the following piece from Manish interesting: The Problem With Single-threaded Shared Mutability - In Pursuit of Laziness .

To his points, I will add one concerning implementation. When writing high-performance code, few things are more frustrating than a compiler refusing to automatically optimize a piece of code, instead forcing you to go through to the error-prone process of doing it manually, because said compiler had to account for an obscure programming language design edge cases.

As it turns out, mutable aliasing (i.e. at least two pointers to a piece of data, one of which allows writing) creates a lot of these edge cases, and by making it illegal, Rust removes a barrier to many useful compiler optimizations, such as autovectorizing code or keeping data in registers as long as possible. This means no more painful fiddling with human-checked contracts like C99's "restrict", which is awesome.

Or at least, it will mean that as soon as LLVM will have fixed their broken noalias implementation :stuck_out_tongue:

5 Likes

vitalyd https://users.rust-lang.org/u/vitalyd
September 29

donallen:

The compiler complains about the second closure, asserting that the first
one requires exclusive access to the object in question. It’s as if it
thinks the two closures could be called simultaneously – worrying about
multiple simultaneous writers – which I suppose could happen in
multi-threaded code, which mine is not.

This might be due to lexical lifetimes, although hard to say for sure
without seeing the exact code. That’s a known issue in today’s Rust, and
something that’s being worked on.

Note that it’s technically a good thing the compiler is checking - this
prevents things like invalidating iterators statically. Unfortunately
it’s lexical right now and causes issues in places that should work.

Yes, this seems to be a situation where the compiler is making a worst-case
assumption that precludes the use of closures in perfectly safe situations
because the compiler can't prove to itself that it is safe. And, as you
say, because the environment of the closure is its lexical environment, the
compiler is enforcing its memory-safety rules at closure-definition time,
not run-time.

I'm no MIR expert but I wouldn't expect NLL to resolve this. I'd think the issue here with two closures that both mutate thing is the same issue you would have if you tried to create two different structs that both contain &mut thing. When you define the first closure you're basically putting &mut thing inside an instance of that closure's type.

To address the OP: This is indeed a common pain point I have with rust, but there isn't much the compiler can do without resorting to global analysis to see where the closure is used (and global analysis is something that rust very specifically does not do). Even in single-threaded code there can still be issues with e.g. reentrancy (though I can't come up with an example right now where this could be a problem for closures).

Common workarounds are:

#1 Use RefCell to lift the aliasing checks to runtime:

let thing = RefCell::new(0);

let mut increment = || { *thing.borrow_mut() += 1; };
let mut decrement = || { *thing.borrow_mut() -= 1; };

#2 If you're in control of the code that wants the callbacks, you can make it ask for a single object that provides multiple methods, possibly via a custom trait (although this can add a lot of boilerplate, defeating the convenience of closures):

trait Count {
    fn increment(&mut self);
    fn decrement(&mut self);
}
2 Likes

As mentioned, it's hard to say whether @donallen's case is NLL or not without seeing the code. If the first closure is still active only lexically (e.g. assigned to a binding) then I'd expect NLL could help? There will be cases where NLL won't help regardless because control flow crosses analysis boundaries.

Came across this blog post entry today that may be interesting and (at least somewhat) relevant to the general theme here.

1 Like

Well, with the help of all of you, I have gotten my finance report generator working. I think the lesson I mentioned in an earlier post -- avoiding use of references in structs and avoiding closures unless absolutely necessary -- reduces the pain of dealing with Rust considerably.

One thing I noticed about closures in the course of learning my lesson: closures with the same type signatures are different types (I believe one or both of the Books says this explicitly). This means that while you can call any of multiple such closures interchangeably, you cannot choose one by returning it from an if or match expression. The compiler will complain that the legs return different types. This seems more than a bit odd to me, to put it mildly. I solved this problem by using functions instead, which makes the code a bit less readable, but it made the compiler happy. It seems that my entire life recently has been devoted to ensuring the happiness of the Rust compiler :slight_smile:

As for performance, the Rust version runs slightly faster than my Haskell version (I did this as a non-trivial exercise in comparing Rust to Haskell), but the Haskell was much easier to write and get working. The latter is probably not a fair statement, though, since I've long since gone through the Haskell learning curve and I'm still on the curve with Rust. And as for performance, in this case, the hard work is done by sqlite, so I think it determines the run-time of the program, not the code firing queries at it. It wouldn't surprise me if writing this program in, say, tcl resulted in similar runtimes.

Another thing that was valuable, aside from finding a usable subset of Rust, was a better understanding of where to find things in the documentation. Book 2 is generally better than Book 1, though it has some things that I consider problem areas and it is still incomplete. The combination of the two is needed. One thing that I think is imperative is improvement in finding things. Unless I've overlooked something, I don't see a way to search either book. At one point, I resorted to cloning the github repository, so I could search with find and grep.

Again, thanks to all for your help. Without it, I would have abandoned this experiment in total frustration, as I have before, and might not have come back to it.

4 Likes