Guidelines for self ownership on copy types

This question has been asked before by another user but didn't receive the discussion it deserved.

I have a type and it implements Copy

#[derive(PartialEq, Copy, Clone)]
pub struct Scalar<'a> {
    data: &'a [u8],
}

And this type can be expensively converted to other types:

impl<'a> Scalar<'a> {
    pub fn to_bool(self) -> Result<bool, ScalarError> {
        to_bool(self.data)
    }
}

Self is taken ownership in the above example to match the naming convention that an expensive conversion between two Copy types should have a method name prefixed with to_. This is similar to methods on primitives like i32 and f32 -- all the methods take ownership of self.

Initially I thought this convention for Copy types was widely used, but looking at the implementors of Copy in stdlib, I don't see that self ownership is widespread. For instance, the following types are Copy but take self by reference in their methods:

  • SocketAddr
  • Duration
  • Instant

Taking a copyable self by reference seems to permeate in crates as well (like chrono), and this is the point where I started second guessing myself. Maybe it is best practice to leave it only to primitives always own self. Or maybe it's the other way around and stdlib and the crates I sampled predate best practices.

Perhaps the answer is that it doesn't matter from an API standpoint (as long as one ensures that object will always impl Copy).

Interestingly I found that it does matter from a performance standpoint. Having our Copy type always take ownership of self resulted in a real world performance increase of 7-8%, which I needed to triple check to verify, as those results seemed incredulous. It's true (the code is open source and if someone was interested in investigating it I can link the repo).

So my question is, should I go around to all my methods on Copy types and have self owned so that they match the naming convention guidelines and are (anecdotally) more conducive to performance? I'm assuming such an act may be considered a breaking change (it's unclear sometimes with implicit copies).

To muddy the waters further, what about a method to expose the inner data: ie:

impl<'a> Scalar<'a> {
    pub fn as_bytes(self) -> &'a [u8] {
        self.data
    }
}
  • The conversion cost is free so as_ seems good
  • But as_ is for borrowed self which would go against the other methods on the type
  • Thankfully owned vs borrowed self for this function shows a negligible difference in benchmarks (but would it matter if there was a difference?)
  • So either it'll be the odd method out to have &self or it will go against the naming convention.

Why does this matter / why am I agonizing over naming? Recently clippy will now warn on wrong_self_convention, and I want to ensure that the APIs I'm creating now stand the test of time to avoid breaking changes. So if I go against the naming convention I'm risking a future warning that I'd need to decide to silence it.

1 Like

My current rule-of-thumb for Copy types is:

if size_of::<T> <= 2*size_of::<usize> { self } else { &self }

In the perfect world the answer would be "always self, and let platform ABI decide when to actually pass a reference at the ABI level".

5 Likes

Looking at what stdlib does is also insightful:

impl char {
    pub fn is_uppercase(self) -> bool;
    pub fn is_ascii_uppercase(&self) -> bool;
}

\s

5 Likes

Thanks for sharing this rule of thumb, I think I will follow it -- at the very least it will solve my indecisiveness (and it seems to match with what I'm seeing performance-wise).

Good to see you semi-answer your own question after 4 years :grinning_face_with_smiling_eyes:

Interesting. If the self vs &self made such a difference then the function is probably rather small, in which case one should make sure it gets inlined, in which case (if it’s inlined) it hopefully doesn’t even make any difference anymore whether self or &self was chosen?

I’d be interested in the details here and where, if anywhere, my reasoning is flawed :slight_smile:

1 Like

A quick test showed that inlining was mostly (probably entirely) the cause. If you want to dive deeper:

git clone https://github.com/rakaly/jomini.git
cd jomini

# The commit that changed &self to self
COMMIT_SHA=32e54ee73c6b08e2779bda844c99694057986777
git checkout "$COMMIT_SHA"~
cargo bench -- deserialize/text

# Now calculate the after
git checkout "$COMMIT_SHA"
cargo bench -- deserialize/text

So it sounds like the compiler is less likely to automatically inline functions when &self is used.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.