Returning a struct alongside a slice of one of its fields


#1

I’m new to Rust and am proud to have finally hit my first impasse with the borrow checker. Here’s a pared-down example that’s giving me trouble.

I am processing a futures::stream::Stream of tweets from Twitter. Each tweet is a Tweet struct containing the tweet’s id and text. For each tweet I would like to perform an analysis of its text, which starts by breaking it up into a vector of Parts. Each Part references a slice of the original tweet text and categorizes it as either a “word” or “whitespace” using an enum. Finally, I would like to map this stream of Tweets to a tuple containing the tweet and its text’s analysis.

It is a design goal to keep exactly one copy of the tweet’s text around, partly for memory-efficiency and partly to practice running into ownership issues like this one, so I am not interested in utilizing clones or copies. I am also feeling-out the ability to write modular code with Rust, so I’m very interested in solutions that speak to better design choices. In particular, I hoped to implement the annotated::to_parts function shown below in a reusable way, but my approach may have caused issues.

Finally, here are the relevant bits of code,

// src/main.rs

tweet_stream.map(|tweet| {

    let parts = annotated::to_parts(&tweet.text);

    (tweet, parts)
})
// ...


// src/annotated.rs

pub fn to_parts(string: &str) -> Vec<Part> {
    // ...
}

#[derive(Debug, PartialEq)]
pub enum Part<'a> {
    Word(&'a str),
    Whitespace(&'a str),
}


// src/twitter.rs

#[derive(Deserialize, Debug)]
pub struct Tweet {
    #[serde(rename = "id_str")]
    pub id: String,
    pub text: String
}

(Playground)

And here are my errors,

error[E0515]: cannot return value referencing local data `tweet.text`
  --> src/main.rs:62:13
   |
60 |             let parts = annotated::to_parts(&tweet.text);
   |                                             ----------- `tweet.text` is borrowed here
61 |
62 |             (tweet, parts)
   |             ^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `tweet` because it is borrowed
  --> src/main.rs:62:14
   |
52 |         .map(|tweet| {
   |                    - return type of closure is (twitter::Tweet, std::vec::Vec<annotated::Part<'1>>)
...
60 |             let parts = annotated::to_parts(&tweet.text);
   |                                             ----------- borrow of `tweet.text` occurs here
61 |
62 |             (tweet, parts)
   |             -^^^^^--------
   |             ||
   |             |move out of `tweet` occurs here
   |             returning this value requires that `tweet.text` is borrowed for `'1`

My intuition tells me that this should be allowed because the Tweet and the Parts that reference the slices of tweet.text are returned together from the closure, so their lifetimes should be compatible. My intuition also tells me that I should be able to write a function for splitting a string into parts, irrelevant to tweets, and use it on a field of a struct without running into issues. But I’m definitely missing something. I appreciate any thoughts or help!


#2

I recommend always putting up an example on the playground, so people can play with it. I’ve made an equivalent example here. Note I’ve filled to_parts with a dummy implementation.

The borrow checker is right to refuse here. Note that tweet gets moved after the closure, so the references in parts would become dangling.

Fortunately for you, to_parts works on shared references, so you can easily fix your code by just passing in a reference. The most simple incarnation would be this, although in your real code you’ll want to streamline that a bit (no need to return tweet, for example).

Overall though, I’d suggest working with iterators here. Assuming the lazy evaluation fits your usecase here, of course :slight_smile: But there’s always collect


#3

Thanks for the useful feedback! I am still left with a couple loose-ends.

First, tweet_stream is actually a stream rather than an iterator, so unfortunately I can’t use the specific fix to the problem that you suggested. I have created a playground example here where a stream is used and my identical errors surface.

Second, I’m confused about tweet being moved after the closure. I thought that because it’s part of the the return value next to parts (in the same tuple), that would indicate to the borrow checker that parts and tweet are both going to live on together, and that there could not be dangling references inside parts. But that’s clearly where I’m wrong! I just don’t know if it’s because there actually could be dangling references and the borrow checker realizes that, or because I’m not communicating with the borrow checker enough information for it to know that the program is safe.


#4

This is known as a self-referential struct, which Rust doesn’t support. There are some crates, like owning_ref and rental, that facilitate this type of scenario in some cases but they’re not a general solution. You can search for “self referential struct” to get more reading material.

In your case, you probably want to do more work inside whatever function/combinator splits the string so that you can make use of the slice references inside Part without moving the tweet value. Alternatively, run a parse pass but instead of ending up with string slices, record the indices where whitespace is present - return those indices to downstream functions. Those functions can then materialize the appropriate string slices on demand.


#5

I think acknowledging that this is a self-referential struct is just what I needed to do—thanks. I will avoid the owning_ref/rental options for now, since I don’t think I really need that kind of complexity yet. I played with boxing a little bit to avoid the self-reference, but it wasn’t very fruitful.

I instead opted to have Part just store a Range<usize> and implement Index over str for my Part using Range<usize>'s implementation, in line with your suggestion. Here is the updated playground example. Worth noting that the indexing functionality relies on a nightly feature, but you could just as easily could implement the “slicing by range” yourself as a method on Part.

This reddit discussion was also a useful resource.


#6

Sorry to dig-up this old issue. Just wanted to add that taking @KillTheMule’s suggestion and working with iterators also solved a lot of the problems I created for myself. I now have my original Part struct holding &strs, but iterate over the parts to perform computations rather than try to pass around a vector of them. Feels much better while solving the ownership issues—thanks!