I'm trying to wrap my head around why you might want to make a trait take &self
vs &mut self
vs self
, and I'm having trouble understanding how these choices interact with the ability to implement a trait for a reference.
Let's take Into
for example:
pub trait Into<T> {
fn into(self) -> T;
}
Why does Into
consume self
? Some objects may own memory on the heap (such as a Vec<T>
), or more generally, non-clone/copy types (&mut T
), so consuming self
means that we can transfer ownership of this memory to the new object and a clone of the inner Vec<T>
(another allocation) is not necessary. This also provides some flexibility -- we can implement Into<T> for &'a CopyableStruct
if there's nothing 'unique' contained in the struct.
Question 1: for a struct whose members are all Copy
types, do we lose anything by implementing Into
for a reference to this struct but not for the struct directly? I tried looking at a few scenarios, and it looks like the answer is no...but I'm sure I've missed something.
struct Test{ //our struct w/ non-expensive Copys
inner: i32
}
impl<'a> Into<i32> for &'a Test{ //impl on the reference to the struct
fn into(self) -> i32 { self.inner } //implicit copy of the int
}
fn uses_into<T: Into<i32>>(arg: T) ->i32{
arg.into()
}
struct StoresInto<T: Into<i32>>{
into: T
}
fn main(){
let mut test = Test{inner: 5};
{
//if we have the struct
//any function that takes T: Into as an argument can be used by passing a reference to our struct
uses_into(&test);
//any struct that stores T: Into can be used by passing a reference to our struct
let stores_into = StoresInto{into: &test};
}
{
//if we only have a &mut reference to this struct...
let mut_ref : &mut Test = &mut test;
//any function that takes T:Into as an argument can be used by deref/ref pattern
{uses_into(&*mut_ref);
//any struct that stores T: Into can be used by deref/ref pattern
let stores_into = StoresInto{into: &*mut_ref};}
//I find it weird that mutable references aren't automatically coerced/reborrowed into regular references...
mut_ref.inner = 6; //mutable reference is still usable as long as the other borrows are dropped
}
}
Question 2: When writing a new trait, why would you ever have a function signature that takes &self
as an argument? Wouldn't it be better to always write traits to take self
and let the implementor decide what 'level' of ownership was required to use the trait for that struct? (E.g. do you need the memory owned by the struct? Do you need a unique reference to the struct? Or is a regular reference good enough?). The above example appears to demonstrate that the implementor does have this level of control....
I guess any discussion/links to discussion around these topics with a focus on best practices is what I'm looking for here...bonus points for using AsRef
and friends to smooth out the edges of this stuff
1 Like
I think you should think of it based on semantics of the operation and the trait itself. Into
is a conversion function - it's intended to take a T
and return a U
, but as a conversion. The conversion bit implies that you no longer care about T
once you have the U
. That's as far as the trait Into
is concerned and you should think of it as the T
has been consumed.
Now comes the part where you decide to implement Into
for some type. Since into
takes self
(i.e. owns it now), you have more flexibility in some cases (i.e. ability to move fields/values out of T
and into U
). In the &'a Test
example you have above this isn't material because you're moving an i32
, but that's a Copy
type and so you can "convert" from &'a Test
into i32
without needing ownership of Test
. But, if Test
had, say, a String
field that you wanted to move as part of the conversion, that wouldn't work with &'a Test
.
But the important bit is to think of the trait's semantics without considering all possible types that may decide to implement it (which is impossible since that set is unbounded and not known to you if this is part of a library that you're writing).
Another way to think of it is via ordinary functions. Say you have:
fn foo<T>(x: T) { ...}
This function says it consumes x
. Of course if you end up passing some Copy
type to it, the source isn't actually consumed. But again, that's the caller's choice - you express intent in the signature.
Often I want to define a trait with multiple methods, so taking &self
allows one method to neither consume nor modify the object, while another method does modify the object.
However, in addition, I would always use &self
for a method that is not intended to modify or consume an object. Otherwise it is just confusing to use, particularly in a generic context. eg
trait HasColor {
fn is_green(&self) -> bool;
fn is_blue(&self) -> bool;
}
fn is_watery<T: HasColor>(t: T) -> bool {
t.is_green() || t.is_blue()
}
If we accepted self
we couldn't write this function without the restriction that &T
has color. This would inhibit the writing of general code that uses the trait.
1 Like
Thanks @vitalyd, I think I get what you're saying...I think there's a good counterexample in Add
to this piece here, though:
But the important bit is to think of the trait’s semantics without considering all possible types that may decide to implement it (which is impossible since that set is unbounded and not known to you if this is part of a library that you’re writing).
Add
consumes self
, presumably to afford implementors the same sort of flexibility you mentioned. I don't think there's anything about the semantics of Add
that suggest that we no longer care about the thing we're adding (although AddAssign
might qualify here). What I'm wondering is - why not always leave it up to the implementors? Would it be a good rule of thumb to write traits that take self
unless there's explicitly a reason not to? What are some examples of things that would preclude taking self
as an argument in a trait?
Aside: I don't understand why AddAssign
exists at all. Can't AddAssign
be derived automatically for every type that implements Add
? In other words, why doesn't this blanket impl
exist:
impl<T, U> AddAssign<U> for T where T : Add<U, Output=T>{
fn add_assign(&mut self, rhs: T){
*self = self.add(rhs);
}
}
I think you answered this yourself - AddAssign
is the variant that cares because it allows in-place updates. It exists because it may be expensive to move/copy temporaries (i.e. using just Add
in the manner you described).
Sorry I'm being a stickler here - I'm pretty new to systems programming and I've decided to start learning Rust a little more seriously since I'm convinced you guys have really got something here...so forgive my ignorance on this one: is this hypothetical implementation of AddAssign
any slower than the manually written, mutable version? E.g. are these equivalent:
struct Test{
inner: i32
}
impl Add<Test> for Test{
type Output = Test;
fn add(mut self, other: Test) -> Test{
self.inner = self.inner + other.inner; //note the mutation, reuse of memory
self //return value optimization?
}
}
impl<T, U> AddAssign<U> for T where T : Add<U, Output=T>{
fn add_assign(&mut self, rhs: T){
*self = self.add(rhs);
}
}
vs the canonical version:
impl AddAssign<Test> for Test{
fn add_assign(&mut self, rhs: Test){
self.inner = self.inner + rhs.inner
}
}
In this example, there likely won't be a difference because Test
is just an i32
. But, this can have material performance impact on large types. RVO is an optimization that may or may not kick in, so you can't 100% rely on it.
That's a good one...somehow I was thinking about the ability to use generic code, but not how the generic code itself would be written...stupid question: how do you write the restriction that &T has color? Like this?
trait HasColor {
fn is_green(self) -> bool;
fn is_blue(self) -> bool;
}
fn is_watery<T>(t: T) -> bool
where for<'a> &'a T : HasColor
{
let ref_to_t = &t;
ref_to_t.is_green() || ref_to_t.is_blue()
}
Damn that's ugly I see what you mean here...although I kinda like the semantics. HasColor
has additional functionality if your implementation doesn't need ownership of the memory in its methods...hmm
Yeah, HRTB would be the way to specify that. I suspect you also wanted to modify HasColor
methods to take self
in that example?
Note that your is_watery
now takes ownership of t
away from the caller. In generic methods, you generally want the caller to decide whether to transfer ownership or give a borrow, if your function doesn't care.
That is correct, good catch! Thanks guys
I realize we've omitted another difference between &self
/&mut self
and self
, which is to support impls for unsized types. You cannot implement a trait function taking self
for an unsized type.
1 Like
Cool...so slices and traits?
trait A {
fn test(&self) -> i32;
}
trait B{
fn test_2(&self) ->i32;
}
impl B for A{
fn test_2(&self) -> i32{ //&self is a trait object. Can't take self here, Self: ?Sized
self.test()
}
}
why would you ever have a function signature that takes &self as an argument? Wouldn’t it be better to always write traits to take self and let the implementor decide what ‘level’ of ownership was required to use the trait for that struct?
I think constraining all your traits and their methods to only take self
(instead of also allowing &self
or &mut self
) would unnecessarily restrict what you are doing. You often use traits to provide a common interface and in all but the most trivial of cases you won't want to be consuming things the first time they're used.
Likewise if people only implemented traits for references instead of directly on the type (impl<'a> SomeTrait for &'a Foo
vs impl SomeTrait for Foo
) then it'd be a real pain because of lifetimes and all that.
for a struct whose members are all Copy types, do we lose anything by implementing Into for a reference to this struct but not for the struct directly?
As for Copy
types, I think you'll find that in practice they aren't overly common. So you don't exactly gain anything by implementing Into
for a reference instead of the struct itself. Actually, it's probably going to have a net negative effect because it affects readability, forces you to unnecessarily borrow when using Foo::from(...)
(something that gets picked up by clippy's lints by the way), and is probably unnecessary for "performance" reasons because the compiler will be immediately dereferencing and copying the thing across anyway.
Just as a datapoint to show how frequently you use Copy
structs, I've currently got the mdbook
repo open in a terminal and of the 21 types with the #[derive(...)]
attribute (according to grep '#\[derive' -r src
), only 2 also derive Copy
.