Confusing syntax of self

I find myself constantly thinking if I should prepend self with ampersand or not, and I'm surprised it is sometimes a reference and sometimes not, depending on the signature:

fn this(&self) -> &Self {
  self // why not &self
}

Looks like self is not always of type Self, but its type depends on the signature. Yet this is inconsistent with how pattern matching syntax or closure parameters work:

vec![1, 2, 3].into_iter().filter(|&num| num > 0) // here `num` is *not* a reference

I wonder - do other people also find it confusing?
Is there a reason it was designed that way?

5 Likes

fn this(self) is fn this(self: Self).
fn this(&self) is fn this(self: &Self).
fn this(&mut self) is fn this(self: &mut Self).

It's not strictly necessary, but with methods these three are so common that it's worth the sugar.

17 Likes

I understand that. The long syntax is ok. The short syntax is confusing because it is the same syntax as in lambdas and pattern matching but has different semantics.

2 Likes

But you can say that about expressions and patterns too. &foo does the opposite thing in expression and parrern position (it references and dereferences, respectively), yet this is the right thing to do due to their different meaning.

2 Likes

Yes, but expressions and patterns play opposite roles: expressions create values; patterns match and destructure values; so it is natural that they do opposite things. Function parameters have matching/destructuring capabilities ... except when the parameter is self and uses the shorthand syntax.

Compare the parameters of n, g and d in this sample

#[allow(unused, clippy::needless_arbitrary_self_type)]
fn main() {

    #[derive(Copy, Clone)]
    struct X;

    impl X {
        fn m ( self)        ->  X { self }
        fn n (&self)        -> &X { self }
        fn o ( self: &Self) -> &X { self }
     // fn p (&self: &Self) ILLEGAL
    }

    fn      f( xelf:  X    ) -> X { xelf }
    fn      g(&xelf: &X    ) -> X { xelf }

    let c =  | zelf|              { zelf };
    let d =  |&zelf|              { zelf };

    let ff: X = f( X);
    let gg: X = g(&X);

    let cc: X = c( X);
    let dd: X = d(&X);

}

where the parameter of n resembles those of g and d (especially d) but the name is bound to &X in the case of n and just plain X in the case of the latter two.

Additionally, p the would-be equivalent of g is illegal.

So the self convenience syntax does break the symmetry and thus introduces scope for surprise. I don't recall it ever having bothered me very much, but the OP does have a point.

I'm glad that the syntax exists, but I do appreciate that it might be a stumbling block for some.

7 Likes

Yes, it is inconsistent.

The usual syntax in function arguments and closures is:

PATTERN: TYPE

and to add to the confusion patterns are duals of expressions, so &T in the pattern position means the opposite thing of &T in the type position.

and &self just makes a mess of both, using self as if it was a pattern, and & as if it was a type.

To be consistent with the rest of the language fn(&self) should have been something like
fn(ref self) or fn(&Self), but it isn't, and it's too late to fix it. ¯\_(ツ)_/¯

15 Likes

Note that a ref foo: Foo parameter makes foo a reference, but the function will still take ownership of the value, so that would still not be consistent.

7 Likes

I think we could pretend that Deref magic matches against the dereferenced value, but doesn't actually move. Just like &* has special semantics, let ref y = x doesn't take ownership of x either. Of course that's incompatible with the current Rust semantics, so I'm not proposing to actually change the syntax, just pointing out that &self is its own thing that doesn't match either the pattern nor the type syntax.

10 Likes

I could well be wrong about this, but I've always considered the self receiver syntax a version of the same thing in Python, just with Rust-specific alterations to account for different binding modes.

1 Like

That would really be the best option. It would also interact nicely with the #[feature(arbitrary_self_types)], since it would allow to write just Box<Self> or Arc<Self>, without repeating the self variable.

I agree that the current syntax for the receiver is a mess. I have learnt just not to think about it, because there is no coherent model where the current syntax could be fit in. Unfortunately, this syntax will definitely not change.

6 Likes

That would also be confusing, since it's unclear if it's fn(self: &Self) or fn(not_self: &Self). At least with the self keyword it's clear when it is or isn't a method.

4 Likes

Since normally parameters must include both a pattern and a type, a lone type as a first parameter would unambiguously denote a method.

I get that it would be a bit annoying to write self: &Self everywhere, but I wish a less confusing syntax would be chosen, like ref self. Oh well.

I try to think of &self as a special syntax (not a pattern, not an expression), which means something like:

"The value of the expression &self (where self is the value of type Self) is made available under the special name self throughout the method body."

Thus if you write fn foo(&self), a shared reference to the Self value is made available under self.

4 Likes

I'm just glad it isn't like C++ where it's after the parameter list :face_with_raised_eyebrow:

6 Likes

If I recall correctly, the introduction of self in the Rust docs state that “self” is special. A lot is done under the hood for ergonomic reasons. Even how the lifetimes are elided depends on whether the value is “self”. For some reason, I never felt the need to go deep to make real the “magic” :)). It’s almost like the authors of the docs were requesting that “I don’t ask… because it’s truly not worth it”. Fair enough because in my experience without thinking about it much I get the job done. This is in contrast to wanting to know how “reborrowing” sometimes goes on to help me resolve a stumbling/confused mental model that I’m sometimes forced to contend with (less in the past year with some of the improved ergonomics).

4 Likes

It is also the case that these things are actually really very simple and consistent within this system of notation:

  • self means self: Self
  • &self means self: &Self
  • &mut self means self: &mut Self

And as for lifetime elision, it's basically, "if there is a self, use its lifetime, otherwise the rest of the elision rules kick in".

This is a very straightforward transformation that anybody can get used to after basically a couple of days with the language. Like @jbe, I also like to think of self as a special case — because it really is a separate set of rules, albeit very minimal.


It is worth noting that any reasonably general-purpose language will have sub-languages, almost mini-EDSLs, inside it. It's pretty much impossible to practically, elegantly, concisely, and correctly notate conceptually distant ideas with the exact same syntax.

For example, there is a whole mini-EDSL for string formatting in Rust, but that's not only Rust; the tradition has lived on, from C through Python to JavaScript. It would make no sense to try and shove the very specific (and substantially simpler) semantics of string formatting into the same syntax as some other subset of the language. Consistency is great and should be one of the foremost goals of language design, but artificially mashing together different concepts can be just as bad as inventing ad-hoc notation for closely related concepts. (Cf. why I think that dot-postfix .await was a mistake.)

8 Likes

Precisely… and thank you the fact-base.

11 posts were split to a new topic: Confusing syntax of .await