Blog Post: Lifetime Parameters in Rust


#7

Will do. I may just need to find a new theme altogether.


#8

I see white background with black text in FF on Linux. What do you see?


#9

Could you please explain what this means?

If the checker could verify I wasn’t mismatching my annotations, then it wouldn’t need me to make them at all.

If I have an opaque algorithm in the body of my function, the checker can’t know which input was copied to which output.


#10

If the checker could verify I wasn’t mismatching my annotations, then it wouldn’t need me to make them at all.

The annotations are used not just for checking the function body, but for checking other code that calls that function. The fact that they are not inferred is a design choice, for the same reasons other type information is not inferred across function boundaries. (This helps enable separate compilation, and prevents unintentional changes to function signatures.)

If I have an opaque algorithm in the body of my function, the checker can’t know which input was copied to which output.

That’s precisely what the borrow checker checks. Any function called within the body also has any relevant lifetimes as part of its signature, so it’s not opaque to Rust’s type system. The lifetime of a reference is part of its type, and a lifetime error is a type error.

Thus lifetimes annotations are inherently unsafe (uncheckable) code. The program could crash accessing memory that has already been freed, which can’t happen with GC.

If you can compile a program with a use-after-free bug in safe Rust, this is a serious bug. It should not be possible.


#11


#12

Yeah I thought of that, but I was thinking of for example some complex loop over arrays mutably referencing arrays which mutably reference back to those first mentioned arrays, e.g. some memory-hard proof-of-work hash function. I guess what I should have realized is if I can’t reasonably write that in safe code, then I will need to shift to unsafe code.

I understand that global inference places some restrictions on what sort of type system you can have, so that might be another hindrance.

Normally I argue that type annotation on function signatures, is better for readability, but in the case of 'a, I am of the opinion it hurts readability. The programmer doesn’t want to reason about the lifetimes, he just wants them to be checked. He will only reason about them if there is a checking error, he needs to fix.

IDEA: Crates has this nifty feature where you don’t need to tell it explicitly the Github changeset because Crates saves it is in a configuration file automatically. I am thinking perhaps Rust’s compiler could do something analogous for lifetimes and infer the lifetime annotations on function signatures. They wouldn’t be global, because we compile incrementally as we edit so the prior inferred annotations would be saved already by the compiler (which fixes the problem of separate compilation, etc). On error, the compiler/IDE could display the inferred annotations. Perhaps this would be much less noisy and might make Rust appear to be much more automatic and friendly. The point is I am trying to find ways to make the lifetimes feature get out of the way most of the time so we can focus on our code, as we always have. So it would have the light feel of GC, but the benefits of needing less memory to attain the same performance and/or less loss of performance to attain the same low pauses latency.


Rust as a High Level Language
#13

I have found I can write a lot of code without using lifetime annotations at all, the compiler infers the right lifetime automatically. Normally when I have to write lifetime annotations is is either because I have made a mistake, or because I am trying to do something genuinely complicated (or at least dangerous, like returning a reference to something that may go out of scope). The only places I have needed them in my translation of EoP so far is when I am returning a closure from a function that captures a closure from the outer function arguments. I think if there is support for returning unboxed closures even this might not be needed.

If it needs a lifetime annotation, maybe it’s time to see if there is a simpler way to do it?


#14

That doesn’t seem to mesh with the strange 3 rules applied to elided annotations, which seem to be about position in the function signature and nothing to do with inference:

https://doc.rust-lang.org/book/lifetimes.html

Is the documentation in error?


#15

Strictly speaking, at a function boundary, the compiler is not inferring the right lifetime – it’s “defaulting”, or maybe you might just call it “guessing”. Basically, within a fn signature, if you omit a lifetime, the compiler uses a series of relatively simple elision rules to fill it in. These rules make local decisions that don’t consider the fn body at all:

  • in fn arguments, missing lifetimes are replaced with an anonymous, fresh parameter, so that fn foo(&u32) is equivalent to fn foo<'a>(&'a u32)
  • in the return type, missing lifetimes are filled in by looking at the arguments, which makes the assumption that the returned reference will be a reference into one of the arguments
    • if there is just one lifetime found in the argument types, use that
    • otherwise, if this is an &self or &mut self method, use the lifetime of the self reference
  • (one exception to the above rules is for trait objects, where the lifetime is chosen based on the innermost enclosing pointer. So Box<Trait> becomes Box<Trait+'static>, but &'a Trait becomes &'a (Trait+'a), essentially.)

In contrast, within the fn body, we will use a much more complex inference procedure that considers the entire fn body.

The reason we draw this distinction is primarily, as @mbrubeck indicated, separate compilation. It’s also because inferring across fn boundaries is actually very, very hard to even do – there are fundamental limitations of how much you can do in that space. Moreover, it makes error reporting even harder, since without any lifetimes written down, it’s really hard to report conflicts (though we have to work more on that front in any case).


#16

Yep. Did you see my idea? I edited it.


#17

I don’ think that changes the substance of my post, which is I find if I am having to add lifetime annotations by hand it is because I am doing something ‘odd’, and I would probably be better implementing it another way? It would be good if the ‘guessing’ rules correspond to best practice.


#18

I didn’t mean to disagree with your post, just to point out a subtle distinction that is often overlooked (which can lead to confusion) between elision and inference. I agree that if you find you are using named lifetimes, it does indicate that you may be doing something a bit more complex, though I wouldn’t go so far as to call it odd or a code smell. For example, a signature like this:

fn foo<'b>(&self, foo: &'b Bar) -> &'b Baz

doesn’t set off any alarm bells for me, it just indicates that the return value is going to be some value borrowed from foo.


#19

That is indeed an interesting thought. We’ve also been thinking that rustw might provide a great, user-friendly way for the compiler to give feedback about the results of its inference.

In any case, the fact that an incorrect fn signature can lead to a lot of confusing errors is very much on my mind lately. In the work on new-style error messages, we’ve been looking at lots of ways to improve this. Some things are as simple as just reformatting how we display the same information, but in a lot of cases we can find better information to display – or else reorder the messages, to give priority to messages that are more likely to be the “root cause” of the problem. Still working on the details here but I think there’s lots of room for improvement even without making any major changes.


#20

If you use Rcs or the like, you don’t have to mess with these annotations at all. If you are passing around borrowed values/interior references, you should indicate where the references are coming from - I don’t like playing the “will this invalidate that” game at all.

Good practice in C in these cases is to to write the “lifetime annotations” in a comment, so having to write them in the code is actually easier.


#21

My upthread off-the-top-of-my-head idea was to have the compiler infer them and write them to a shadow file. I posited we can still see them in an IDE (toggle) and/or in error messages. So I posited we would only have to write them when we need to overrule the inference or when inference is undecideable (and then the can be removed after compiling because written to the shadow file). I am not arguing that is a good idea. I am just trying to figure out if what Rust is now is best, and for which use cases it is best fit.

My thought is that most of the time I don’t want to look at or think about that detail. The inference could probably be good enough 80% of the time perhaps. I am prioritizing looking at the other semantics of my code. But maybe I am mistaken, because maybe I am thinking about how when I code only my innermost performance routines in C, I am always passing in borrowed pointer to a C function which doesn’t leak any references, so I don’t even need to think about it. And in other code (ever since using C++ with refcounting for the last time in 2002), I have been using a GC so I didn’t have to think about it.

Perhaps this is related to my point that what probably or perhaps dominates the asymptotic memory performance of 80+% of my code is avoiding the type of “memory leaks” that can occur with both Rust’s checked lifetimes and GC. I just can’t imagine writing a mobile game (speed not a major factor, i.e. not 3D), word processing, or other productivity app with such coding precision that the entire body of code is compile-time checked manual memory mgmt (although I did ref counting for my last C++ app finished in 2002). Instead I would write most of the code employing GC and then the critical parts that need to be super fast I would write in C as stated above.

Thus for me, I envision a language that combined high level and low level coding, would allow me to write my “C” code in the same language, but still alllow me to use GC for most of my code. What aggravates me about managed languages such as Java, JS, is I have to switch to a FFI and other language to write the performance bits of code. I would prefer it was all unified if that makes sense.

Yet I am also paying attention to what others say here and trying to relate it to my experience. So I am still trying to for a clear analysis of what I want and for which use cases. I realize there are other use cases where perhaps different priorities and patterns apply.

Afair most of my code is not passing Copy types. Seems boxes are the norm, except in some requirement for highly performant (lower-level) homogeneous (or hetereogenous via enum) arrays and vectors.

I’m eager to read different perspectives.

Edit: if asymptotic “memory leaks” is the main culprit to attack (thus Rust’s lifetimes wouldn’t fix it), perhaps for example the browser should isolate memory partitions for each tab open in my browser and kill tabs that are overconsuming when GC starts thrashing instead of locking up my entire Linux desktop as Firefox does now. And it is probably all that adware bloat scripting that google et al dump on every web page with advertising.


#22

IMO the advantage of the current system is, that the lifetimes are hidden for the common/intuitive cases. And they are explicit where they don’t conform to the obvious rules. This raises attention that something is different.

I think it would probably be more confusing if they were omitted in all cases. Because you would always expect the default case at first and only notice the difference after compilation fails.

Also, I’m always a bit sceptical about inference. Yes, there exist powerful algorithms, but they can only infer what is possible to infer. If I make a mistake, inference cannot point me to it because I didn’t write up my expectations.

Like const-correctness in C++ for example. I often hear that it is unnecessary because it can be inferred.
Sometimes I see a parameter that should “obviously” be const. In a different language I’d just think “well inference will do it correctly”, but in C++ i can actually mark it as const. Then most of the time it doesn’t just compile as is, but there are missing annotations here and there and often there’s a little detail that actually prevents the whole thing from being const without some refactoring. It’s tedious but I can refactor until it compiles. And often this means improving encapsulation and correcting design flaws.


#23

That’s where my Rust code is very different from my C++ code. In C++ I use new much more often than box in Rust. This is mainly because of the “move by default” policy. A combination of pass by value (moving) and some clone() calls here and there is usually sufficient.
In Rust, even the new() functions usually return by value.

In C++ projects that make extensive use of reference counting, ownership is often not clear anymore. That means you have to RC everything, because maybe it is needed somewhere and allocating on the stack could lead to nasty stack corruptions. In Rust ownership and borrowing is always explicit and that problem simply doesn’t exist anymore.


#24

This makes me think of IDE friendly features like being able to directly give the compiler an abstract-syntax-tree in some kind of easy to use format like JSON, and have the compiler give back the syntax tree with type annotations. That way the types can appear as tool-tips when you hover over code.

Regarding Copy types, I have so far preferred to use explicit Clones rather than requiring Copy in my code, and the only place I have needed a Box so far is returning a closure from a function. I am hoping that need for a Box will go away soon, and I can return a closure as a trait, allowing better inlining possibilities.

I really don’t understand the need for Boxes everywhere, they should only be needed where you have runtime polymorphism, which in most programs is a small subset of the total code (where you would need virtual functions in C++). Certainly in Haskell you only want to use existential types when strictly necessary? How much use do you see of existential types (the equivalent of boxes) in the average Haskell program? Maybe people need to look at using more functional design patterns?


#25

Afaics, we’ve established that refcounting is worse in every way:


#26

I don’t think that “we” have established anything.

Your citing a blog post that is over 10 years old. All the downsides of RC that are mentioned in that article are IMO solved in Rust or not valid:

  • It is very OO-centric and assumes that everything can be casted to System.Object
  • It outrightly dismisses value types with very questionable reasoning
  • It ignores moving as it is the default in Rust and also possible in C++ (move constructors).
  • It assumes that if you use RC, you have to use it everywhere. Which is not true at all as Rust shows.
  • It assumes pervasive multithreading and thus atomic RC, which is not necessary most of the times and safely so in Rust
  • It mentions memory consumption as a downside of RC because every object is 4 bytes larger. WTF
  • It mentions deterministic finalization as one of the key points of RC, but in the end they decided against it because of language interop with other GC-languages.

Circular references are really the only problem from the entire list that is a valid point.

OTOH, they gloss of over the problems of GC, for example:

  • Increased memory consumption (now really).
  • If used sparingly and if without producing much garbage, GC has still a high overhead while RC has not.
  • Interop issues with non-GC languages. Finalizer running in a different thread is a PITA.
  • Deterministic destruction which is mentioned but not considered important enough (which I disagree).

EDIT, another point:

  • Dependencies on the order of finalization is not possible with GC.

EDIT2, yet another point:

  • In interop, GC doesn’t know about allocations in other languages. With C#, I’ve even experienced OOM in native code because of the GC being such a memory hog.