Blog Post: Lifetime Parameters in Rust

You meant to say: move semantics.

A type with copy semantics leaves the original memory location intact when used, whereas a type with move semantics invalidates the original memory location on use, ensuring the value doesn't get cloned by accident.

1 Like

The difference between move and copy types is not boxed vs non-boxed but trivially copiable vs non-trivially copiable.
For example, Vec cannot be trivially copied, and thus cannot be copy. "Copy" types are always POD (plain old data), i.e. they cannot have destructors (Drop).
Every non-copy type uses move semantics. If a Vec is passed by value, it is moved (like move constructors in C++). That means, that the internal buffer of the Vec is used for the new object. It's actually a simple memcpy. But this means, the original object is left in an indeterminate state and must not be used anymore. And this is statically checked.

Box is also a move type, it guarantees unique ownership of the object it points to. It's like a Vec with exactly one element.

EDIT:
The example that you cited is indeed confusing. Both Vecs are moved to the function and then returned (again moved). But the function doesn't have to return the same Vecs that it received, it can return any two Vecs that it likes.
It's just a function taking two Vecs by value and returning two Vecs by value.

That phrase could be misinterpreted to mean that Box does not always have move semantics. I understand you mean that, "It is not required for a type T to be Box<T> in order for the type to have move semantics. Any non-Copy type (like Vec is) will have move semantics.".

I had remembered that I had looked up the source code for Vec in the past when I was reading "The Book" Rust documentation.

The confusion for me was that I was thinking that if the any portion of the data type (i.e. a struct) could be stored on the stack with a fixed Sized, then it was either assumed or could explicitly inherit some Copy type. I was thinking that only by putting a Box<T> around T would then T become a non-Copy type.

Now I understand that what determines a Copy type, is whether all of the data can be stored on the stack and that there are not complications with copying the data.

So now it is making sense to me that the only way to have copy semantics with a non-Copy type is to first clone it, then move it.

I still don't know how the compiler determines which are the Copy types though. I don't know the exact rules and how it is inferred or declared.

Unless I simply missed the correct section of the "The Book", perhaps this is an area that needs improved explanation in the documentation. Or perhaps it is just due to the fact that my brain is often in between semi-conscious and waking dream-like brain fog (uggh you don't want to know!). Battling an illness that has neurological impacts...

I had always tested very, very high on reading comprehension, but I do have a problem now with my coherence boundary between sleep and awake being fuzzy, so sometimes it all gets jumbled.


P.S. I hate to make excuses but there is fact at play which is I am still trying to work my way out of a strange undiagnosed chronic auto-immune (gut dysbiosis?) illness which (manifests with symptoms similar to M.S.) often makes me suffer CFS (chronic fatigue syndrome) symptoms which are I guess best understood as "forehead on the keyboard after 36 hours w/o sleep", yet in my case I can have that symptom even upon awaking. Sometimes I will have my very sharp and crisp energy, but other times I will be slogging through mud mentally and sometimes it hits me right in the middle of some complex discussion so I will drop the ball temporarily. After 4 years of acute decline following 6 years of insidious decline, finally it appears I am getting some improvement to this ailment, but yet there will still be periods where I am pushing too hard where I should really just be sleeping all day, next day, and next day. So sometimes I could go to the computer for a few hours fighting my sleep because I already slept 8 hours, but really I should on those occasions stay in the bed for an additional 12 hours. Any way, sorry to mention this, but I just want you to know it isn't intentional, nor do I expect you to treat me any differently because of it. Hopefully this illness will be entirely cured soon. I am on a new treatment regimen which appears to be helping.

Copy is a trait, so it is implemented for those types by one of two means: either an impl declaration or a deriving attribute (i.e. impl Copy for Foo { } or #[derive(Copy)]). Copy has Clone as a supertrait, so Clone must also be implemented for the type. In my experience, the most common way to implement Copy for a type is giving it the attribute #[derive(Copy, Clone)].

Note that some built-in types are Copy but not Clone, specifically arrays with more than 32 members, but this is a minor implementation detail that will hopefully be resolved eventually (e.g. by const parameterization).

Note: In generic code nothing is assumed or inferred about the trait bounds so if 'Copy' is not an explicit bound on the function, no copying will happen.

fn test<A>(x : A) -> A {
    f(x)
}

In the above 'x' will always be moved, into 'f'.

fn test<A>(x : A) -> A where A : Copy {
    f(x)
}

Here 'x' will always be copied (as the manual says Copy types have copy semantics).

For my generic code I prefer to use the 'Clone' bound as that is less restrictive on the types that can be passed, so the function is more general, as any Copy type is also Clone, and Clone can be implemented for any type. It has the added advantage that cloning is explicit, so you can more easily count the cost of cloning in a generic algorithm.

fn test<A>(x : A) -> A where A : Clone {
    f(x.clone())
}

To clarify further, inside that function body, values of the type A are treated as having move semantics, but values will be passed to that function with copy semantics if they are Copy types. That is, the first of these is valid, but the second is not:

fn test<A>(_: A) { }

let x = 0;
test(x);
test(x);
fn test<A>(x: A) {
    test2(x);
    test2(x);
}

fn test2<A>(_: A) { }

Thanks to both of you adding that information. The documentation will hopefully continue to improve (and I am not implying it is not already reasonably good given 1.0 was just reached last year).

And add your clarifications on the implicit semantics, to the (so far only small set of) reasons why I am thinking that borrowing could possibly have been the better non-annotated default (though I don't have yet a comprehensive understanding of all factors yet and no intention to start an impassioned debate, just mentioning the thought), so then the default inside a function is always more consistent in that the function body doesn't own the obligation to deallocate the heap resource in either case of being a Copy type or not. I guess a reasonable guiding principle of PL design might be that if all other considerations are equal, then better to have consistency where there are implicit semantics.

I lean more to this stance now given I now know that borrowing really has nothing to do with whether some of the data structure for a function argument is stored on the stack, e.g. Vec, i.e. borrowing is orthogonal to performance issues of pass-by-value or pass-by-reference. And for that reason, I also find the use of the symbol & for borrowing to incite confusion given its historical meaning in C/C++ as it is ingrained in my mind to make take the address of and create a reference/pointer. It is akin that I played American football since 5 years old, then I instinctively want to grab the soccer ball with my hands. If I had been around when the decision was made, I might have suggested a symbol for moving which is new for Rust, e.g. =>. Any way, I guess that is already decided.

Aren't we nearly always borrowing except for factory functions (functions that returning a constructed object)?

Back on the topic of lifetime parameters, others have mentioned that an optional GC might still be under development for Rust and also we have the Rc / Arc alternatives. It just occurred to me that if we will want to mix GC / Rc / Arc instances in the same function, then the problem is that the functions we call which take these instances as inputs may already be annotated with lifetime parameters. In other words, I am thinking the lifetime parameters annotations compel the annotated functions to be used only with compile-time managed lifetimes. Unless i am missing something in my understanding, it seems annotation kills the ability to intermix and infects the code every where.

I am thinking that for much of my code I am likely to end up using a well designed generational GC{1} most of the time, and use only borrowed function arguments{2} infrequently in cases where it is very non-convoluted for the compiler to manage the resource lifetime, especially where temporary objects can be optimized. Because I am guesstimating (but I might be wrong) that with good design control over the creation of temporary objects with a very well design generational GC, then perhaps most of the claimed memory consumption versus performance tradeoff disappears{3} and thus I can't justify adding complexity to code for very little gain{4} (because afaics it is a loss in the overall scheme of costs and goals of open source). Thus I am beginning to wonder if tracking lifetimes from inputs to outputs might be an unnecessary pita in the overall appraisal. I am not arguing to change anything in Rust as others (and this community) may have other priorities/preferences/use cases. I am just wondering if anyone can present a cogent rebuttal which might cause me to realize I am very wrong to think down this line of pondering.

So I am thinking borrowing everywhere (actually lifetime checking off) by default (no moves and no annotations), no lifetime annotations, and only using compile-time checked resource lifetimes in simplest of scenarios. Radical thought.

Edit: there would still need to be an annotation on an input argument that should be a compile-time enforced borrow (or alternatively on input arguments which should not have enforced borrow) and on a let that requires a compile-time enforced lifetime.

Curious if anyone can show me that it is short-sighted. I am trying to find some reasonable simplification, because the thought of ever more complex annotations on typing functions and needing to create superficial scopes to enable one to have two mutable references to portions of the same object, feels intuitively to me as a step backwards or sideways in the evolution of programming languages. But of course, I am probably wrong. Devil is in the details, not just intuition.

And note, apparently I could use Rust and just stick with my desired usage pattern, as one of my options. Perhaps I could even write a parser that outputs to Rust and enforces my desired defaults (although this might get unwieldy). I am just considering all my options, and first I need to rationally figure out what I really want and why.

Edti#2: impacting my thought process is the assumption I have that Rust's compile-time lifetime checking isn't going to work in a majority of the cases, e.g. multiple mutable references to elements in collections. Thus I am thinking we end up with lifetime parameters on 80% of our functions, for a feature we can't employ 80% of the time. Maybe I am wrong about the relative percentages. And upthread discussion didn't seem to declare Rc / Arc as a clear winner over GC for the other cases.

{1} [quote="shelby3, post:2, topic:5737"]
V8 is already doing this apparently:

A tour of V8: Garbage Collection — jayconrod.com
Write barriers: the secret ingredient
[/quote]

{2} [quote="shelby3, post:2, topic:5737"]
Correction: the lifetimes of the iterators are not encapsulated if they can leak into the outputs of the pure function. I assume such could in theory be prevented by a compiler by annotating the constraint on the inputs that their references can only be borrowed. Well so we can model this with Rust's lifetimes and as stated GC wouldn't be needed. So seems I've found a use case for Rust's lifetimes, but note it is a special case and not substituting for GC in general. And so far this has not identified a need for moves.
[/quote]

{3} http://benchmarksgame.alioth.debian.org/u64q/which-programs-are-fastest.html

{4} Why Haskell matters - HaskellWiki
https://www.quora.com/Is-Haskell-as-fast-as-C++-If-not-why-not/answer/Jon-Watte
When Haskell is Faster than C | Hacker News (When Haskell is Faster than C)
https://www.quora.com/Is-Haskell-as-fast-as-C++-If-not-why-not/answer/Jon-Harrop-2

Actually, we get people asking why borrowing adds extra syntax over moves, instead of the other way around, all the time.

Nobody asks for "actually lifetime checking off," because that would make Rust a memory-unsafe language. Is that a typo?

1 Like

Thanks for the reply.

I was arguing the same point as @Maledictus for Rust's current "lifetimes everywhere" design:

What does @steveklabnik1 mean by "silently putting things behind a pointer" below? I thought borrow was orthogonal to representation on the stack or heap as we discussed upthread, and was only signifying lifetime semantics checking.

@steveklabnik1's point reaffirms to me that "lifetimes everywhere" is complex. Taking borrows in the same code block feels like minutia (and then I need an artificial braces scope to handle some cases).

I was referring not to Rust's current "lifetimes everywhere" design-philosophy, but to a potential alternative design-philosophy which I am contemplating would be "GC everywhere except where we can get compile-time lifetimes at very low complexity" especially to improve upon GC's overhead with freeing temporary objects.

I've thought a little bit more about my idea and the following is where I am with it thus far:

  • let can only declare owned (including Copy) instances or GC instances. No borrows. The non-annotated default is GC instance. I'm thinking := instead of = for owned instances.
  • Functions take compile-time enforced borrows on those arguments which are not annotated. No compile-time checked moves are allowed.
  • The optional annotation on function arguments (perhaps :: instead : since it means the function is keeping a reference to the instance indefinitely) declares the argument to a GC instance. So use this to handle all cases that compile-time checked borrowing won't. Note GC instances can be input to either type of function arguments, but lifetime-checked instances can only be input to compile-time enforced borrowed arguments.
  • No lifetime parameters. Input lifetimes can't be transferred to outputs.
  • Function outputs are allocated as lifetime-checked or GC instances according to the function body, and lifetime-checked can be converted to GC instance (but not vice versa, because GC instance wouldn't have been forced to be borrowed within the function body) based on the assignment of the function output by the caller.

So essentially afaics I am encouraging the programmer to do compile-time borrowing as much as possible, so compile-time ownership can be used sometimes. For all other cases, GC is employed.

This seems much simpler. Afaics, there is nothing unsafe in terms of "use after freed" memory deallocation.

I am thinking mutability should not be compile-time checked at such a low level in terms of disallowing for example two mut references to same instance. Instead use higher level abstractions which enforce mutability when desired. I am contemplating to only allow mut on let and function arguments (and btw I would prefer val and var instead of let mut and let and make function arguments val by default and they can optionally been prefixed with var to make them mutable).

Radical brainstorming.

From a borrow-checker point of view, taking a reference is just borrowing, that is it's only relevant for lifetime semantics. But references are implemented using pointers, and since Rust is a low-level language with predictable performance characteristics, that implementation detail and it's associated performance overhead matters.

Essentially I'm positing a different philosophical point-of-view, which is best captured by that article:

Which makes the point that for most programs higher-level will be just as fast if not faster than writing in a low-level language. In order to get the maximum performance out of a low-level language requires a lot more work. If you just slap some code together, the high-level language will often be faster or not much slower than the same effort in a low-level language. And especially when factoring in ongoing maintenance and other "newly-acquainted" banging on the open source.

Whereas, the other perspective is the "masochist" or perfectionist philosophy that says we can justify to do initially and moreover to maintain all that extra effort and complexity to get a 2 - 3X faster performance and/or memory consumption.

Simpler things scale exponentially faster than more complex one. There is a compounding to network effects.

For me, I only want to write low-level code for perhaps 5 - 20% of the code of my projects. And when I write that low-level code, I want to use something closer to C, so the compiler will get out my way. Any improvements over C should not require me to annotate with unsafe.

I wildly conjecture that ultimately a well designed high-level semantics will be more optimizable than most low-level code, because the compiler can better refactor the algorithm. High-level code is more alive and can keep up with the project changes. Low-level code can become brittle and stagnant, thus losing performance relatively speaking to the changing semantics over time.

Perhaps programmers like the idea of making more work for themselves since they view this as their employer's problem. But I am thinking about what scales faster, because the more scaling, then the more resources applied to optimizing the language.

I am sure you are aware that Rust used to have GC, and that there was a garbage-collected pointer type. My understanding is it was removed from the language because people found they wanted to use move/borrowing more that GC, and for the small remaining cases ref-counting was enough, so it was adding a lot of complexity to the language implementation (and requiring a runtime) for little gain. My point here is that this is a community opinion (if I have it right) formed by many people using the language over many different projects, and formed when the language did have GC, so its not like the people don't know what they are missing. They had it and decided they don't need it.

Edit: I think there is clearly a place for GC, and languages like Go use GC and have lightweight N : M concurrency with goroutines. As GC and N : M concurrency both require a runtime, you can see why these features go together.

I agree it might be a difference of philosophy and every community should be respected.

It might reflect actual wisdom of the crowd, or it might reflect lack of exposure to other design patterns to accomplish the same goals with better high-level semantic encapsulation. I am not sure. That is the exploration I have been trying to learn more about. It is a complex analysis.

I am cautious for example @troplin's upthread experience with horrible GC performance with C#, which forced him to abandon it entirely. And now he prefers Rc, but I am not sure that is really the best choice. I am thinking no one is able to exclusively use Rust compile-time lifetimes and to never use unsafe to break out of mutability gridlock. They must also use Rc / Arc or GC sometimes. The question is what is the percentage split and how does it vary for different use cases.

I think its priorities. Rust has a clear goal to be used for OS, and embedded development, so that it can be used anywhere 'C' is currently used. On that basis it needs to have no runtime, and operate in as little memory as 'C. This has clearly directed the development. Rust is ideal for desktop applications, Operating systems, writing garbage collectors for other languages, and algorithms which do not parallelise well.

Go on the other hand is designed to scale over thousands (or even millions) of in-flight requests without taking a thread per connection with its associated memory cost. Go is great for web-services, and parallel performance where the algorithm is "ridiculously parallel".

Personally I am thinking of moving more development from C/C++/Ada/Haskell to Rust, and of moving Java, JavaScript, Python, Ruby development to Go.

That all makes very good sense, except I don't understand why Rust is good for desktop applications. I am thinking Rust is too much low-level tsuris for apps, excepting those which need maximum performance such as image editing and 3D games (and even some of those might not need Rust, e.g. if they are using OpenGL). Although the typeclasses, modules, and Crates are definitely a major plus IMO.

I am not convinced Go is the best choice yet for that other genre. Can we make a thread to discuss that?

Edit: C++ has declined from 15% to 5% over the past decade. Ostensibly Java only got a bounce because of Android. C hasn't lost much because it is the essential low-level tool one turns to get absolute maximum portable performance without using non-portable assembly:

http://www.tiobe.com/tiobe_index

PHP and Python took most of that. Which is indicative of changing priorities of the types of projects people do more of these days. Notice the easy languages can quickly gain popularity if they are well matched to the use case. Python and PHP are trending down now, because we need better performance server side. But as you say, then parallelism, concurrency, and asynchronous programming patterns dominate.

It's not the goal to use borrowing everywhere, every tool has it's use case.
Ownership is the key point. Once you got this right, you know automatically what kind of mechanism to use. Rust just forces you to think about it.

unsafe is not a tool to blindly circumvent the restrictions. It can be used to enable cases that the programmer knows to be safe, but cannot be proven by the compiler. But then it is crucial that the unsafety doesn't leak into safe code, you always have to build an abstraction around the unsafe code that provides a safe interface.

Personally, I've never used unsafe directly and I don't think I have to in the near future. The most useful abstractions over unsafe code are already provided by the standard library.

Well stated. I do understand all of that finally by now. And my current decision or summary of my analysis. Thanks for all the discussion. It has been very valuable to me. My gratitude.

Thank you. Understanding the purpose of lifetimes with counter examples was exactly what made the concept click for me.

This topic was automatically closed after 27 hours. We invite you to open a new topic if you have further questions or comments.