Newbie, using Rc instead of Box in recursive data structure

My first rust project, first post here!
As an exercise to convince myself that Rust traits really work just like Haskell typeclasses, I tried porting a homework assignment on typeclasses from a Haskell course into Rust:
https://github.com/altaurog/rust-traits

Generally, the entire point of using Rc is to make clones extremely cheap. I'm a bit confused about what's going on in your SymbolMapExpr. Maybe you can make a playground with only that file that shows the error?

To best mirror the nature of how data types in Haskell are always boxed, you might want to consider working with something like

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum ExprTInner {
    Lit(i32),
    Add(ExprT, ExprT),
    Mul(ExprT, ExprT),
}

pub type ExprT = Rc<ExprTInner>;

then you can use methods such as

fn add(self, other: Self) -> Self

or maybe

fn add(&self, other: &Self) -> Self

in the Expr trait which helps with the following execises to implement the type class for types like integers, bool, etc.
Edit: I should’ve read your code first, that’s what you’re already doing.

The point is that you’ll want to implement Expr for ExprT i.e. for Rc<ExprTInner>.

And you must not be shy in .clone()ing Rcs whenever necessary, that’s the whole point of having them.


By the way, if the trait was external to your crate, you might need another wrapper struct to be able to not violate the orphan rules of Rust trait impls with an implementation for Rc<ExprTInner>. Something like

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum ExprTInner {
    Lit(i32),
    Add(ExprT, ExprT),
    Mul(ExprT, ExprT),
}

pub struct ExprT(pub Rc<ExprTInner>);

or, alternatively, it could look like

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum ExprTInner {
    Lit(i32),
    Add(Rc<ExprTInner>, Rc<ExprTInner>),
    Mul(Rc<ExprTInner>, Rc<ExprTInner>),
}

pub struct ExprT(pub Rc<ExprTInner>);

only using the struct in the outermost layer.

Alice, thanks. For me, the motivation to use Rc with SymbolMapExpr was to be able to pass closures around. Cloning the closures didn’t seem to work and passing them around directly led to all sorts of trouble with lifetimes. I just wanted to confirm that this is a reasonable approach and ask if there are any alternatives. As it is, the code works, here’s a playground:

Here is a playground using Box that works, but does a lot of cloning.
And here is a playground with my attempt to us Rc instead of Box.
The error I am getting doesn’t make sense to me:

error[E0308]: mismatched types
  --> src/lib.rs:38:26
   |
38 |         Lit(i) => T::lit(i),
   |                          ^
   |                          |
   |                          expected `i32`, found `&i32`
   |                          help: consider dereferencing the borrow: `*i`

If Lit(i) matched, then i must be an i32. That makes me think the compiler is confused.
What am I doing wrong here?

There’s some automatic (slightly magical) rule that if you match a constructor against a reference to a struct/enum then the fields you get are behind references. You can replace the i with *i at the use site as suggested, e.g. T::lit(*i).

2 Likes

Here’s a version of your Rc playground using a type ExprT = Rc<ExprTInner>-based approach

Rust Playground

And here’s a variation with fn(self, other: Self) -> Self methods

Rust Playground

For ExprT this alternative trait would mean that the caller might have to clone some Rcs sometimes to call the methods, but in cases like evaluating exprt_to_expr with T == ExprT, this has the advantage that the RCs in local variables, i.e. xt and yt are not cloned unnecessarily and then discarded anyways. (It’s not much of a difference though because cloning an Rc really is very cheap.)

Taking some code from my previous post’s first playground as an example, this match

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {
    match &**exprt {
        Lit(i) => T::lit(*i),
        Add(x, y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.add(&yt)
        },
        Mul(x, y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.mul(&yt)
        },
    }
}

is (in less magical notation) the same as

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {
    match &**exprt {
        &Lit(ref i) => T::lit(*i),
        &Add(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.add(&yt)
        },
        &Mul(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.mul(&yt)
        },
    }
}

where the & pattern matches a “through a reference” and the “ref ident” pattern binds a value by-reference.

This could further be simplified by dropping the intermediate reference alltogether, i.e.

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {
    // this still does not move or copy the ExprTInner from behind the `Rc`
    // it can inspect the value in-place and `ref ident` patterns
    // can reference the existing fields of the enum behind the `Rc`  
    match **exprt {
        Lit(ref i) => T::lit(*i),
        Add(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.add(&yt)
        },
        Mul(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.mul(&yt)
        },
    }
}

and also, in case of an integer like i (or any Copy type for that matter), you can also drop the ref and the need to dereference (*i) afterwards:

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {
    match **exprt {
        Lit(i) => T::lit(i),
        Add(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.add(&yt)
        },
        Mul(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.mul(&yt)
        },
    }
}

There’s also a rule in Rust that return values of blocks are aways moved, which is why e.g.

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {
    // note the extra braces
    match {**exprt} {
        Lit(i) => T::lit(i),
        Add(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.add(&yt)
        },
        Mul(ref x, ref y) => {
            let xt: T = exprt_to_expr(x);
            let yt: T = exprt_to_expr(y);
            xt.mul(&yt)
        },
    }
}

fails to compile

error[E0507]: cannot move out of an `Rc`
  --> src/lib.rs:40:12
   |
40 |     match {**exprt} {
   |            ^^^^^^^ move occurs because value has type `ExprTInner`, which does not implement the `Copy` trait

(to be clear / avoid confusion here: just like in Haskell, when the compiler tells you that some type doesn’t implement some trait / type class, this doesn’t necessarily mean that you’re supposed to try implementing Copy for ExprTInner, even though you might feel encouraged to do so by the error message. Instead it means, in this case, that you shouldn’t try moving the whole ExprTInner value in the first place.)

Whoa. That is a really important and arguably non-obvious point. If that was mentioned in the rust book, I totally missed it. So I was confused, not the compiler. Thanks for that clarification.

After thinking about it, I guess it makes sense, particularly considering the distinction between data on the stack and data on the heap; if the enum behind the reference is on the heap, then its contents will have to be accessed by a reference as well, correct? If so, a copy from the heap to an implicit val on the stack would be needed in order to maintain the semantics of Lit(i) implies i: i32, and it’s pretty obvious that Rust wouldn’t do that.

Yeah, the book only covers the basics. I’m not sure if there’s any example of matching references in there. The Rust Reference has all the details you need on this though -> here. This feature of so-called “match ergonomics” is actually a bit younger that Rust 1.0, here’s the RFC introducing it.

1 Like

I'm possibly a step aside your question but you could avoid some cloning operation inside add and mul methods by removing references.

pub trait Expr {
    fn lit(val: i32) -> Self;
    fn add(self, other: Self) -> Self;
    fn mul(self, other: Self) -> Self;
}

impl Expr for SymbolMapExpr {
    fn lit(val: i32) -> Self {
        Rc::new(move |_| Some(val))
    }

    fn add(self, other: Self) -> Self {
        Rc::new(move |env| self(env).zip(other(env)).map(|t| t.0 + t.1))
    }

    fn mul(self, other: Self) -> Self {
        Rc::new(move |env| self(env).zip(other(env)).map(|t| t.0 * t.1))
    }
}

Okay, I guess I’ve been using python for too long. I am trying to wrap my head around the refs and derefs. I think I understand that if

type ExprT = Rc<ExprTInner>;

and function signature is

pub fn exprt_to_expr<T: Expr>(exprt: &ExprT) -> T {

then match **exprt makes sense. Param is a ref to an Rc, so I deref the param and then the Rc, is that right? But I don’t think I understand match &**exprt. How is that different from match *exprt? (And the fact that that works with the same match patterns is part of the magic?)

Does this mean that the i actually does get transparently copied?

These versions invalidate the parameter outside the add, doesn’t it?

If I write tests like this, I get use of moved value errors:

    #[test]
    fn test_reuse() {
        let a = ExprT::lit(2);
        let b = ExprT::lit(3);
        let sum = a.add(b);
        let prod = a.mul(b);
        let val: i32 = exprt_to_expr(&sum);
        assert_eq!(val, 5);
        let val: i32 = exprt_to_expr(&prod);
        assert_eq!(val, 6);
    }

    #[test]
    fn test_all() {
        let a = SymbolMapExpr::lit(3);
        let b = SymbolMapExpr::var(String::from("x"));
        let c = SymbolMapExpr::var(String::from("y"));
        assert_eq!(eval(a.add(b)), Some(9));
        assert_eq!(eval(a.mul(c)), Some(9));
        assert_eq!(eval(a.add(b).mul(c)), Some(27));
    }

Is there a way around this other than pass-by-reference?

matching against &**exprt (expression of type &ExprTInner) with patterns of type ExprTInner is indeed a weird and inconsistent thing in the language. Don't feel bad for being confused about it, it actually means you have a consistent and rigorous mental model which has spotted that something weird is going on, here.

This weird thing is called match "ergonomics", i.e., it is syntactic sugar. It's a "feature" of the language for those not wanting to bother about indirection correctness, and who just want the simpler high-level approach of "I put a & on the expression being matched" (called the scrutinee), "and I automagically get &-based bindings in in the right hand side of the => arms".

  • Note that besides how much I personally loathe this "feature", especially the fact that there is no simple way to opt out of it (cc @H2CO3), one has to admit that the quoted rationale is legitimate (that is, I am not trying to strawman or anything): different people use Rust expecting different things from it. There are low-level people who want to have a good understanding / control of the indirections involved, but there are also more high-level / functional people who just want their pattern-matching logic to go through, without Rust being obnoxious about having to add * or & here and there, and refs in other places (in this case there is also a vicious cycle of ref being quite a rare beast, which makes it harder to understand, which makes it be ditched by many instances of the language).

    And while all that is true, it doesn't change the fact that for type/indirection-rigorous people, match ergonomics are surprising at best, and confusing / a foot-gun at worst: having the feature may be legitimate, but so is having a way to opt out of it.

What these ergonomics do is allow you to use patterns of type ExprTInner, against a scrutinee of type &ExprTInner. This will have the semantics of automagically ref-modifying all the subsequently created bindings:

/// With match ergonomics
fn option_as_ref<T> (opt: &'_ Option<T>)
  -> Option<&'_ T>
{
    match opt {
        Some(/* let */ inner) => Some(inner),
        None => None,
    }
}

the above identity-function-looking implementation is thus sugar for:

/// Without match ergonomics
fn option_as_ref<T> (opt: &'_ Option<T>)
  -> Option<&'_ T>
{
    match opt {
        &Some(/* let */ ref inner) => Some(inner),
        &None => None,
    }
}

It leads to "funny" / confusing situations such as:

let x = &Some(42);
match x {
    Some(inner @ 42) => assert_eq!(inner, 42), // type error, lhs is a &{integer}!
    _ => unreachable!(),
}
1 Like

Consider the types: If exprt: &Rc<ExprT>, then *exprt: Rc<ExprT> and **exprt: ExprT and &**exprt: &ExprT. You can’t move the value *exprt or **exprt (so something like let foo = *exprt won’t work, and neither will match *exprt { x => {…} } where the pattern x will try to bind by-value/move), but you can match against it… actually you can’t do much in terms of matching on *exprt because it’s an Rc, not a struct/enum with public fields (something like match *expr { ref x => { … } } still works), but as I demonstrated you can match **exprt against the enum variants of ExprT, as long as you make sure that the patterns don’t try to move any fields.

&**exprt can be “moved”, because it’s a shared reference to any attempt to move it won’t fail but instead it will copy the reference. Matching against this value of type &ExprT can work with &… patterns or with match ergonomics, as I’ve already explained (and as the linked RFC also explains, etc…)

So, to answer the question, the main difference between *exprt and &**exprt is: they have a totally different type.


Well, the idea could be that all implementors of Expr would need to clone or copy the value behind the reference anyway. In this case the (self, other: Self) signature gives you as the callee the option to either move a value into the call or to call .clone() yourself, but only if you really still need the value afterwards.

So you’d write

let sum = a.clone().add(b.clone);
let prod = a.mul(b);

but this way, you’ll only need to clone a and b once, whereas with (&self, other: &Self) methods, you’d have

let sum = a.add(&b);
let prod = a.mul(&b);

and both add and mul would clone a and b internally, resulting in more clones overall.

Of course, there might also be non-copy types that could implement (&self, other: &Self) versions of add or mul without any need to clone both arguments first. To give some example: I guess e.g. implementations of arbitrarily sized integers would only need to clone a and then they could perform an addition with &b by mutating the clone of a in-place without any need to clone b. So (&self, other: &Self) vs. (self, other: Self) is a tradeoff. Perhaps the most performance-keen person could try to write a trait that features both methods and has one version default-implemented in terms of the other.

FYI, there is a half-solution: Clippy now contains a lint, pattern_type_mismatch, which checks for the match ergonomics misfeature. It is allow-by-default, but you can turn it on explicitly. It helps at least in cases where you are in control of the code (and the linter setup).

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.