Generics are consumed? What is T!? not a variable [Solved and more!]

Hi! I was writing some code, then I found something curious, is more like a interpretation thing of the compiler, I expect this to not be big, but is interesting to look on, I wanted to ask other things about this code, but in this thread I'll focus in one point:

fn foo<'a, I, T>(items: I)
where
    I: IntoIterator<Item = &'a T>,
    T: AsRef<str> + 'a,
{
    let refs: Vec<&'a str> = items.into_iter().map(|item| item.as_ref()).collect();
}

You know why we need that? lets go one step back on the logic, a way to iterate and retrieve a reference of each element.

fn foo<I>(items: I)
where
    I: IntoIterator,
    <I as IntoIterator>::Item AsRef<str>,
{
    let refs: Vec<&'a str> = items.into_iter().map(|item| item.as_ref()).collect();
}

This one will fails because:

6 |     let refs: Vec<&str> = items.into_iter().map(|item| item.as_ref()).collect();
  |                                                        ----^^^^^^^^^
  |                                                        |
  |                                                        returns a value referencing data owned by the current function
  |                                                        `item` is borrowed here

Here the key point, the item is being borrowed, but in Generics, the code should be generated from the call, so if would be calling like:

let vector = todo!();
foo(&vector)

Then the description does not match, the foo function will not own the value, we should be able to run this, obvs we could also pass the value it self and consume it, then it would fails.

The question is, how should we interpret T? because in generics we can't know if we own a variable or not, there could be other special cases where normal variable concept are not that clear, we could also think on the most restricted case, in this case a owned value, and there can be still more edge cases!

What have you seen on Generics that break the normal variable interpretation? is there a clear way to think of T?

You wrote:

<I as IntoIterator>::Item: AsRef<str>,

This means that the items the iterator produces should be of a type that can be borrowed in order to obtain a &str. That follows from the signature of AsRef::as_ref:

fn as_ref(&self) -> &T;

When we substitute in the specific types from your code, and un-elide lifetimes, we get the signature:

fn as_ref(self: &'x <I as IntoIterator>::Item) -> &'x str;

Using this trait bound, you can have an &str that is borrowed from the iterator item, which is only valid as long as the iterator item is not dropped. But your code in foo drops each iterator item (in |item| item.as_ref()) as soon as it is created. In the working version,

    I: IntoIterator<Item = &'a T>,
    T: AsRef<str> + 'a,

we aren't borrowing the iterator items, we're passing them unchanged to as_ref(). Therefore, the reference produced by T::as_ref() can carry through the 'a lifetime that you started with.

The key difference is that in the first case, you are passing an existing reference to as_ref(), and in the second case, the method call is implicitly creating a reference &item for you — which doesn’t live long enough.

3 Likes

are you coming from a C++ background? that is not true for rust. (and I heard C++ now has "concepts" in the standard, so I think it would not be entirely true for C++ either).

we dont usually say we own a variable, but guessing from the context, I think you are probably conflate "ownership" and "binding mode" (by-value vs by-ref)

the short answer is we differentiate binding mode using different pattern syntax for the binder:

// bind to type `T`, by value
let x: T = todo!();
// bind to type `T` by reference
let ref x: T = todo!();

the long answer is slightly more complicated, due to how the type system works and stuff like match ergonomics.

for example, when the type T is itself a reference type, say e.g. &str, then a by-value variable can be seen as equivalent to a by-ref variable:

// these are equivalent:
let s: &'static str = "hello";
let ref s: str = *"hello";

I don't know what your intepretation what a "normal" variable is, but there's really no difference between a variable of a generic type vs one of a concrete type.

what operations you can do on a variable is completely determined by the type of the variable. e.g. if the type implements a method as_ref(&self) (either inherently, or through a trait), then you can call it.

it's just a type. specifically, it must be type-checked at definition time, it's completely different from a (pre "concepts" or "concepts-lite") C++ template argument, which is only checked at instantiation (not "definition") time.

if you are looking for something like a C++ template argument, which is essentially just an AST node, then it's closer to rust macro_rules than rust generics.

1 Like

According to the standard it never was entirely true; C++ has a confusing "two phase" templating system where some things are checked at declaration time, and other parts are checked at instantiation time. As I understand them, Concepts are supposed to "just" move the specificied checks earlier.

This was confused somewhat by some compilers (specifically and notably MSVC) not supporting two phase checking for a long time.

1 Like

Incidentally, you want:

 fn foo<'a, I, T>(items: I)
 where
     I: IntoIterator<Item = &'a T>,
-    T: AsRef<str> + 'a,
+    T: AsRef<str> + 'a + ?Sized,

Type parameters like T have an implicit Sized bound (which ?Sized removes). str is not Sized, so without the adjustment, you can't take an iterator that hands out &strs.

If T: AsRef<U>, then &T: AsRef<U> too, by it forwarding on the implementation. So without the adjustment, you could still take iterators that hand out &&strs.


I'm guessing at your mental model, but references are their own concrete types in Rust, like borrow-checked pointers. And if you take a reference to a reference:

fn nest_attempt(s: &str) -> &&str {
    &s
}

The outer reference can't be valid once the inner reference goes out of scoped.

(References have special properties that let your reborrow through a reference without borrowing the reference itself -- that's how Vec::get works, say -- but it's a special property of references, not something expressible with a bound on a generic. And not what happens when the generic is monomorphized with a reference type.)

It can sometimes be useful to think of "owned or borrowed" in terms of "is not a reference or is a reference". But it can also be useful to think of "owned or borrowed" as something more like "I have control by value or not". It's possible to have a reference by value.

From the function body perspective: A type that meets the bounds on the function,[1] but nothing else may be assumed.

(From the caller's perspective: If I can meet the bounds, I can call the function.)


  1. implicit and explicit ↩︎

Hi, thx all for all the answers! I have been using Rust.... for 1.5years, but from time to time I found some curious messages that makes ask my self if I'm understanding this correctly.

I follow most of what is wrote, and also how inner sections of Item about the iterators.

What is also curious about this is that a trait like IntoIterator works very different depends in where is implemented, while is used in Foo the trait will consume the variable, while if is implemented and called on &Foo the trait will not consume it, instead will just return a reference to the items.

This type of scenarios makes confusing about T, because how T is not exactly a memory variable or something like that, do not have that sens to think of it in terms of own/borrowed/references, because it can be anything! I

That open the question of, how should we think of T, when the base Rust concepts do not works directly.

This is a choice made by each of the individual implementations of IntoIterator, not the trait itself or the language. It's possible, just unconventional, to implement IntoIterator for a reference such that the iterator produces clones, not references. (Or does something else entirely.)

yes, the question there is, how we are free in how T can be, how can we interpret it on the code, because can be anything, does not have much sense to even say T is being consumed.

When you call into_iter on a reference, you’re still consuming the reference. Of course that doesn’t amount to much because shared references are Copy and exclusive references have the magic ability to be reborrowed rather than moved.

2 Likes

That seems to be a key thing, from what I know, references and mutable ones are not consumed, just "passed by", like just send the pointer there.

But from what you says, references are consumed, but they implement the Copy trait so they are copied and that is why do not desappears, @jdahlstrom is that what you says?

If is the case, the pointer is the same even after the copy, right?

They are “consumed”. Just like i32 are “consumed”, too.

Magic happens not inside of generic, but outside of it.

  1. Shared references are Copy and act just like i32 (that is: original is still valid)
  2. Unique references are, well… unique: there's reborrowing mechanism that's used precisely when functions are called (and in a few other places, too): since rule is not “there can only be, ever, one unique mutable reference”, but “there can only be, ever, one unique mutable active reference” (that's needed because otherwise confusion between “original owner” and “unique mutable reference” would mean “unique mutable reference” couldn't exist at all!) Rust uses it to create a temporary active copy that's sent into function (doesn't matter generic or not) to be consumed… while original becomes active (and usable) after return from function.

I guess that's where confusion comes from: there are rules… and there are special exceptions from the rules for types like i32, shared references, mutable references, etc. For concrete, not-generic, types.

And that's how @kpreid/@quinedot and @latot may look on the same code and perceive generics so differently:

  • If one thinks about generics as “these things that strictly follow the rules… and with no exceptions and “cut corners”, nope” then generics are “easy”, because they always follow the rules and there are no exceptions… it's non-generic code where funny exceptions happen.
  • If one discovers the rules by experimentation… and, specifically, by experimentation with non-generic code… then the fact that generics strictly follow rules and don't apply these bazillion special corner-cases that you have discovered while experimenting with non-generic code… makes then very confusing.

What do you mean by that?

Finally everything is taking their place.

Well, most time I learned Rust was usually presented like I did above "we borrow a variable", I never knew the references are consumed, Rust have a lot of parts, usually simpler ones, but there is a lot of them! For example to get this, why and how works we at least need to understand to Traits, Clone and Copy. I try mix learn and work at the same time, so we can't always wait to get deep understand of them to use Rust, maybe this could be improved in the future :slight_smile:

ÂżReferences are Copy? like literal, or is a way to say references have implemented the Copy Trait?

About the pointer question, a reference has a memory address, when it is consumed, it keeps the actual value of the address or just make a new one with other memory address?

Thx!

X is Copy is the same thing as saying X implements the Copy trait.

When something is moved, its value is copied to the new owner variable and the old variable is no longer usable (which is enforced by the compiler). References don't change their value when they are moved, and neither do any other types.

So, move can be more expensive than use a reference if is enough big? Like a vector need to copy all its data to a new address to be moved, there is also the chance to make a new variable and keep the heap address to not copy all that.

Move and Copy both copy the value. But in both cases it is a shallow copy. Moving a Vec does not copy the elements because they're allocated on the heap, so only the pointer, length and capacity are copied. Copying very large structs can be expensive, and in such cases you may choose to Box them so that only the pointer is copied.

But yes -- passing by reference is normally used for passing struct parameters, especially large structs, when the value is logically being borrowed.

Nice! everything is on place now! this just makes me think Rust would need an easy way to know how much mem is copied when we move something.

When moving/copying a value of type T, the number of bytes copied is
size_of<T>().

nice! the name is not intuitive! but is nice be able to know that!

Why the docs says:
In general, the size of a type is not stable across compilations.

what can change in the compilations to this do not match?

Don't know, but I wouldn't worry about it since any differences would be small. They could be referring to the fact that the Rust format for structs, enums, etc, is not guaranteed. This is done to grant the compiler freedom to optimize, and for optimizations to change without breaking any guarantees.

thx all with all this! has been very interesting everything!

@quinedot Which other constrains have T than Sized? is there a place with the implicit ones?