References and recursion

Issue

I've been working on an Exercism problem (acronyms) in the Rust track for the past week plus. And I'm stuck trying to satisfy the compiler (I think the borrow checker to be more specific but I'm not entirely sure when something is borrow checker or not).

Attempted solution:

I was porting a solution of the same problem from Haskell and Swift to Rust. The solution is uses spitter combinators and I didn't have many issues porting the Haskell library to Swift. So I'm trying to do the same with Rust. The basic idea is to traverse a vector of characters. We create chunks that are either delimiters or text. There is also a condense policy that dictates how we should handle delimiters and blanks. condense (the function I'm having difficulty with) is recursive in nature (maybe relevant, I'm not sure) and combines contiguous delimiters together. For example, if you have a vector of chunks [Text(it_doesnt_matter), Delimiter("a"), Delimiter("b"), Delimiter("c"), Text(shrug)], then condense would return [Text(it_doesnt_matter), Delimiter(vec!["a", "b", "c"]), Text(shrug)] (or something similar).

The Chunk is as follows:

#[derive(Clone)]
pub enum Chunk<A> {
    Delimit(Vec<A>),
    Text(Vec<A>)
}

impl<A> Chunk<A> {
    // convenience methods
}

and the CondensePolicy is as follows

enum CondensePolicy {
    Condense,
    DropBlanks,
    KeepBlanks,
}

impl CondensePolicy {
    fn condense<A: Clone>(&self, list: Vec<Chunk<A>>) -> &Vec<Chunk<A>> {
        match (self, list.first()) {
            (Self::DropBlanks, _) | (Self::KeepBlanks, _) => &list,
            (Self::Condense, None) => &vec![],
            (Condense, Some(first @ Chunk::Text(xs))) => &vec![Chunk::Text(xs.clone())]
                .iter()
                .chain(list.iter().rev().skip(1).rev())
                .cloned()
                .collect(),
            (Condense, Some(Chunk::Delimit(xs))) => {
                let (delimiters, others) = span(list, |x| x.is_delimit());
                let vector = vec![Chunk::Delimit(
                    delimiters
                        .iter()
                        .flat_map(|&x| match x {
                            Chunk::Delimit(xss) => xss.clone(),
                            Chunk::Text(xss) => xss.clone(),
                        })
                        .collect(),
                )];
                let ys = &vector
                    .iter()
                    .cloned()
                    .chain(self.condense(others).iter().cloned());
                
                &vector
                    .iter()
                    .cloned()
                    .chain(self.condense(others).iter().cloned())
                    .cloned() // OFFENDING LINE
                    .collect()
            }
        }
    }
}

fn span<'a, A, F>(xs: Vec<A>, predicate: F) -> (Vec<A>, Vec<A>)
where
    A: Clone,
    F: Fn(&A) -> bool,
{
    match xs.iter().position(predicate) {
        None => (xs.into_iter().collect(), vec![]),
        Some(index) => (
            xs.iter().take(index).cloned().collect(),
            xs.iter().skip(index).cloned().collect(),
        ),
    }
}

Consistent problem I don't know how to address:

The above details blocks of code yield the following error that I don't know how to resolve in this case:

Error message 1:

type mismatch resolving 

`<std::iter::Chain<std::iter::Cloned<std::slice::Iter<'_, chunk::Chunk<A>>>, std::iter::Cloned<std::slice::Iter<'_, chunk::Chunk<A>>>> as std::iter::Iterator>::Item == &_`

expected enum `chunk::Chunk`, found reference

note:   expected type `chunk::Chunk<A>`
      found reference `&_`

This version of the code is the closest I've been to compiling. Another similar solution yielded another error which I wasn't sure how to handle when using chain or using collect:

Error message 2:

a value of type `std::vec::Vec<chunk::Chunk<A>>` cannot be built from an iterator over elements of type `&chunk::Chunk<A>`

and several similar error messages with different references types (more &). I resolved this error by enforcing A to implement the Clone trait.

Questions:

  1. Which traits are a good idea to implement / derive on your custom types?
  2. Are there any traits you should avoid with generic types like Chunk<A>?
  3. Are there any traits you should include with generic types like Chunk<A>?
  4. How do I resolve Error message 1?
  5. Was adding <A: Clone> a / the correct way to resolve Error message 2?
  6. How do I remove the .cloned() and .clone()?
    • I think I have two many but I'm not sure how to remove them. I'm used to Swift's copy-on-write semantics and value types, which seem to be somewhat related to Rust's references when properly borrowed / managed. But I could be completely wrong. I'm aware of COW but haven't done a deep dive yet into how to use it.

I haven’t gone through everything, but you probably want a different return type here. &Vec in this case is an immutable reference to a growable vector owned by self. It’s strictly more expensive and no more powerful than a slice reference &[Chunk<A>].

You probably meant to return Vec<Chunk<A>> (without &), which transfers ownership of the vector to the caller. It’s already being moved around as a single pointer, and the data resides on the heap. This should also clean up a lot of the clone calls because you will be able to move items out of these vectors with into_iter instead of getting references which leave the original in place.

A more advanced solution would be to return impl Iterator<Item = Chunk<A>>, which lazily produces the Chunks as needed instead of prepopulating them into a Vec at every step. This has the potential to require nonobvious changes throughout your code to make work, though.

1 Like

If you have a Vec, its inherent methods can sometimes make a task easier than doing the same thing with iterators:

fn span<A, F>(mut xs: Vec<A>, predicate: F) -> (Vec<A>, Vec<A>)
where
    F: Fn(&A) -> bool,
{
    let tail = match xs.iter().position(predicate) {
        None => vec![],
        Some(index) => xs.split_off(index)
    };
    (xs, tail)
}

To quote your code:

It doesn't make sense to return something by reference that you just created. That would return a reference to a local variable, which would result in a memory error in other languages; Rust catches this mistake statically.

A &T reference in Rust is NOT an owning pointer. It's a "view" pointer, a temporary one, that you cannot use to keep stuff around.

In this case, and many others, you should just return the newly-created vector by value.

1 Like

This misunderstanding of &T references in Rust occurs frequently in posts from new Rustaceans. IMO this comparative terminology—view pointer vs. owning pointer—should be added to TRPL and used extensively in answers on this forum (URLO) in cases where such explanations are appropriate.

3 Likes

Agreed @TomP. Though I would take it further and just always call them temporary/ephemeral view pointers or something like that. I think always highlighting the lack of permanence of references is important since that is just as much the issue as not understanding that & is view/readonly.

1 Like

Isn't it spelled out clearly enough in the book in the References and Borrowing section?

These ampersands are references , and they allow you to refer to some value without taking ownership of it.

I always imagined the idea was not to talk about "pointers". References are not just pointers, like C++ references are not pointers.

" view pointer vs. owning pointer" or " temporary/ephemeral view pointers" starts to get clumsy.

1 Like

Unfortunately, things like Java and C# also wanted to not talk about "pointers", and used the word "reference" for something that (shared) owning.

So I can see how people from certain languages could have trouble. Not that I know what to do about that...

1 Like

The only thing that comes to mind is to choose a noun that other computer languages don't use. "Locators" comes to mind. Or always qualify the word "references", such as in "temporary shared-read references" and "temporary unsharable full-access references". Then we can speak of these collectively as "temporary references". (Personally, I prefer the word "ephemeral" to "temporary", but that may be more difficult for ESL Rustaceans to understand.)

The key thing that seems missing in the Rust learning about references, vs references in other languages, is strong and repeated restatement of the fact that references do not create persistent storage of the referred-to object.

I feel like "box" and "borrow" fit this description.

The perhaps it should be "private-locking read/write borrow" and "shared-locking read-only borrow", where the long phrases convey the essential semantics. box is a Rust owning type, not a borrowing type, so I don't understand why you included it in a discussion of Rust references.

Hmm... we already have a word for that "mutable". It's built into the language as "mut". Mutable implies unshared. So that is:

"temporary immutable references" and "temporary mutable references".

"reference" as defined in the book, see above, implies "temporary" as you don't get ownership. The owner is responsible for the items destruction so one has to give it back.

So that is:

"immutable references" and "mutable references".

1 Like

The discussion was about how to improve new Rustaceans understanding of the differences between Rust's ephemeral references and the more-persistent references of other languages such as C++. If the currrent terminology already made these distinctions clear to new Rustaceans, we would not have so many posts in this forum about non-compiling code that attempts to use references to store semi-persistent data.

I was not proposing that Rust change it's terminology (although many think that uniq would have been a better qualifier than mut). I was only discussing extending teaching materials to emphasize that Rust's references of themselves do not provide storage for a referred-to object.

It’a included in this discussion because it (along with Arc) is what people usually want when they think of a “reference” from a language they learned earlier, like C++ or Java.

1 Like

I do understand, it's about the exposition.

With a background in C/C++ and similar languages I would not have described references/pointers in those languages as "more-persistent". One is often referring to temporary things obtained via malloc or new or automatic variables on the stack. I would expect C/C++ people to pick this up pretty quickly, having been suffering from lifetime problems their whole lives. As it were.

I presume Java/Javascript users and others have the difficulty, they are used to objects hanging around as long as there is any reference to them in existence. What with their automatic garbage collectors.

I was just reading through the section in the book on References and Borrowing again, wondering if there is anything missing. It all seems clear enough.

It might be a bit TL:DR for first timers perhaps. I was trying to find the concise paragraph or two that hits one in the face with the facts of the situation. Still looking ....

2 Likes

I think my preference is to just generally call them ephemeral exclusive/shared borrows. Don’t think you can get away from the word reference in general since it’s used all over the place but I think just saying a reference is an ephemeral exclusive/shared borrow accomplishes the goal of both giving a good indication of what’s going on while simultaneously not using words that are too overloaded in computers/programming.

To @ZiCog’s comment though, it would be nice to be able to point to a concise explanation somewhere when explaining it.

1 Like

Because the question was about terminology for a

Not about shared vs. mutable references. A Box is an owning pointer, no?

I don't see it as "shared" vs "mutable". They are independent of each other not mutually exclusive.

What then would you refer to an &mut as? For example:

    static mut STAT: String = String::new();

    fn a (stat: &'static mut String) -> tokio::task::JoinHandle<()> {
        tokio::spawn(async move {
            loop {
                *stat = "Hello from 'a'".to_string();
                println!("a: {}", stat);
                delay_for(Duration::from_millis(1000)).await;
            }
        })
    }
    
    fn b (stat: &'static mut String) -> tokio::task::JoinHandle<()> {
        tokio::spawn(async move {
            loop {
                *stat = "Hello from 'b'".to_string();
                println!("b: {}", stat);
                delay_for(Duration::from_millis(1000)).await;
            }
        })
    }
    
    loop {
        tokio::select! {
            _ = unsafe {a(&mut STAT)} => { println!("operation 'a' completed") }
            _ = unsafe {b(&mut STAT)} => { println!("operation 'b' completed") }
        }
    }

I would not call it a "view pointer" because you can do more than look at it, you can touch it.
It's still not owning anything.
It's shared.

Personally I think we should call a spade a "spade":

& Thing is an immutable reference to a thing.
&mut Thing is a mutable reference to a thing.

One only need know what the words mutable and reference mean in the Rust context.

Specifically "reference" implies "not owning". If you have such a reference you did not create the thing and you cannot destroy it. Which implies it has a lifetime that is out of your control. Often temporary. So be aware of that.

Box and other such "smart pointers" seem to be a different topic to me. They are library facilities not part of the language itself. "Box" is all about allocating on the heap and so very different from a reference or even a raw pointers.

2 Likes

Exclusive.

In fact your snippet has undefined behavior because it creates multiple exclusive references to the same element.

2 Likes

Yeah, I know. Wicked, aren't I :slight_smile:

I was just trying to make a point about the existence of such things. And probably failing miserably.

If it's any consolation, the C++ guys can't figure out pointers/references either: Jonathan Müller - “Rethinking Pointers”: https://www.youtube.com/watch?v=kYiEvVEh6Tw