Struct owning a slice; best practice

I have a struct that needs a slice [T] to do what it's supposed to do. It doesn't need ownership to do its job, but if I let it hold a reference &'a [T], then the struct will be infected with a lifetime 'a which means I can't store it somewhere along with the slice it refers to (because self-referential structs are tricky in Rust).

I could make the struct store a Vec<T>, but sometimes I'll just have a slice (or boxed slice) available, so in those cases, I'd need to create a Vec first.

I thought about generics, but this feels highly unergonomic. To compare the different solutions, I created the following examples, Foo0 through Foo3, below:

use std::borrow::Borrow;
use std::ops::Deref;

pub struct Foo0<'a> {
    inner: &'a str,
}

impl<'a> Foo0<'a> {
    pub fn new(s: &'a str) -> Self {
        Self { inner: s }
    }
    pub fn show(&self) {
        println!("{}", self.inner);
    }
}

pub struct Foo1<S> {
    inner: S,
}

impl<S: Borrow<str>> Foo1<S> {
    pub fn new(string: S) -> Self {
        Self { inner: string }
    }
    pub fn show(&self) {
        println!("{}", self.inner.borrow());
    }
}

pub struct Foo2<S> {
    inner: S,
}

impl<S: Deref<Target = str>> Foo2<S> {
    pub fn new(string: S) -> Self {
        Self { inner: string }
    }
    pub fn show(&self) {
        println!("{}", &*self.inner);
    }
}

pub struct Foo3 {
    inner: Box<str>,
}

impl Foo3 {
    pub fn new(string: Box<str>) -> Self {
        Self { inner: string }
    }
    pub fn show(&self) {
        println!("{}", &*self.inner);
    }
}

fn main() {
    {
        let s: String = "Hello Zero.".to_string();
        Foo0::new(&s).show(); // works
        //Foo0::new(s).show(); // but we can't make `Foo0` being owned
        Foo0::new("Hi Zero.").show();
    }
    {
        let s: String = "Hello One.".to_string();
        //Foo1::new(&s).show(); // fails
        Foo1::new(&*s).show();
        Foo1::new(s).show();
        Foo1::new("Hi One.").show();
    }
    {
        let s: String = "Hello Two.".to_string();
        //Foo2::new(&s).show(); // fails
        Foo2::new(&*s).show();
        Foo2::new(s).show();
        Foo2::new("Hi Two.").show();
    }
    {
        let s: String = "Hello Three.".to_string();
        //Foo3::new(s).show(); // fails
        Foo3::new(s.into_boxed_str()).show();
        //Foo3::new("Hi Three.").show(); // fails
        Foo3::new("Hi Three.".to_owned().into_boxed_str()).show(); // unnecessary clone
    }
}

(Playground)

I wonder: what's the idiomatic way to go? I guess I could also use deref_owned::GenericCow, but this feels like even more overhead and not idiomatic. Or std::borrow::Cow, but then there's an extra runtime overhead.

You could use Rc<[T]> or Arc<[T]> depending on your use case.

Both of these have strictly higher overhead than Box<T>.

The thing is that OP already named the 2 options: either take ownership, or infect higher-up datastructures with the lifetime annotation.
There isn't really a way out of that choice, one of those options must be taken.

Personally I'd just have the data structure take ownership and be done with it. Unless it's been actively benchmarked to be problematic in terms of performance, the performance delta likely isn't worth the cognitive overhead of the reasoning that generally goes along with explicit lifetime annotations.

5 Likes

One question that I wondered about is how do I take ownership? Should I take a Vec<T> or a Box<[T]> (or a String or Box<str>, respectively)? How can I know as a provider of an API what the user of the API would likely want to provide?

The way out is to use generics. Consider Foo1 and Foo2 in my Playground example in the OP. Only generics seem to allow

  • creating a self-contained (owned) struct when desired, while
  • alternatively allowing to pass a slice reference as well (without unnecessary cloning or additional indirection).

But generics make the ergonomics much worse (both for the API user and the API provider).

I feel like there is no good solution in Rust to address this problem. :slightly_frowning_face:

If a [T] provides sufficient functionality for your purposes, then either Box<[T]> or Vec<T> should work fine.

That is mostly a matter of enumerating the use cases you want to support. Think about where the library would and could be used. Then think about what kinds of data structures the consumers might he able to supply in those use cases, and aim for those.

Often enough, accepting either a slice or an iterator proves to be quite flexible in terms of what a user can supply, so depending on your use cases, one of those 2 might fit the bill.

If you want to allow arbitrary data then generics is likely the way to go.

1 Like

In my concrete use-case, it doesn't matter. I have a ParallelIterator, which I could collect either into a Vec<T> or Box<[T]> (as soon as this PR is released). I can also keep the data on the stack and just pass a reference to a slice. So I'm flexible in the concrete use case.

But I would like to design the library in such a way that it can also be used easily in other scenarios, and this is where I have to make a decision. I see various options:

  1. Take a slice reference and add a lifetime to the struct. This makes it impossible to store the struct along with the slice together in an outer struct.
  2. Take a slice reference and always clone the data in the constructor of the struct. Seems like a waste of CPU.
  3. Accept Vec<T>.
  4. Accept Box<[T]>.
  5. Accept Rc<[T]>, Arc<[T]>, Cow<'static, [T]>, or whatever other smart pointers you could think of. This would actually be bad in my own use case, so I likely won't do that.
  6. Make the struct generic and demand Borrow<[T]>.
  7. Make the struct generic and demand Deref<Target = [T]>.
  8. Make the struct non-generic but make the constructor take some generic Into<Vec<T>>. (I generally dislike these sort of generic-argument APIs, so I don't really like this variant.)

I think my favorites are variant 1, variant 3, and variant 6. But it feels like a "Qual der Wahl"[1].


P.S.: I'm also asking because I would like to understand better what to do in other, similar cases. I feel like this pattern/problem happens in Rust easily whenever you have a data structure whose operation depends on a slice.


  1. To have a "Qual der Wahl" is a German phrase. It could be roughly translated to "being spoiled for choice", but the latter is more euphemistic. The German word emphasizes on the disadvantages of each option. ↩︎

TIL :slight_smile:

Personally I'd go for option 3 (Vec<T>), though it should be noted that option 4 (Box<[T]>) saves a little space as there is no capacity field, which a Vec value does have.

The most general answer is rather unhelpful: it depends on the situation.
However, as a heuristic, I tend to collect ownership of data in 1 place (generally in a newtyped Vec), and then just borrow that.
I often model complex interrelationships in such data with newtypes indexes, and use those same indexes to represent identity. So for example:

struct Item {
    data: String 
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] 
struct Idx(usize);

fn main() {
    let items = vec![
        Item { data: "hello".to_string() },
        Item { data: "world".to_string() },
    ];


    let string_ids = (0..items.len())
       .map(Idx)
       .collect();
} 

It's cheap to pass around an Idx, and collecting them into a new collection usually isn't too expensive either.
The ergonomicity comes largely from the fact that the idxs are both non-borrowing and Copy, meaning

  1. It doesn't own heap data
  2. It doesn't borrow data and
  3. As a result of 1 and 2, an Idx is easy to pass around in the sense of satisfying borrowck

How it matters for you then, is that a consumer can in principle just own a Vec<Idx> and index into the data when necessary. If you need to refer to the data from somewhere else, collect Idxs there as well such that it owns the idxs, but not the actual data.

Once that is in place the rest is usually easy enough IME.

Box<[T]> is an owned slice. If there's a lifetime, it's not owned.

2 Likes

I would probably also go for option 3. My reasoning would be that <[T] as ToOwned>::Owned = Vec<T> (and not Box<[T]>). Thus Vec<T> is the canonical "owned" form of a slice. See also ToOwned implementors.

I'm not toooooo happy with that because it feels a bit like losing type information (though I see how the newtype approach mitigates this to some extent). You still have two structures then (the actual data and the indices) which both need to match.

In other words: the approach would be to simply not include the slice in the struct – neither borrowed nor owned. I'll keep that in mind though. I think it's a good solution for certain cases where you don't want the lifetime checker to make things complicated (while also avoiding unsafe).

I'm not sure what you refer to. Maybe the topic of the thread? Well, "owning" was just one of the options discussed, I also considered to let the struct not own the slice.


I currently think that:

  • &[T] is the way to go when you need the code to be fast and when it's acceptable to infect the struct with a lifetime.
  • Vec<T> is the easy and idiomatic way that causes least trouble (even though it may involve an unnecessary clone in some cases).
  • Box<[T]> can be a slighly optimized variant which might or might not be worth the trouble (and will have downsides if you later decide to use or return/extract the inner data somehow).
  • S: Borrow<[T]> is the semantically correct way to describe that the struct needs to have access to a [T] which could be owned or borrowed. Note that this even covers Cow<'a, [T]> or Arc<[T]> (Playground).

An option not mentioned in the OP is a custom DST.

#[repr(transparent)]
pub struct Foo { innner: str }

It has the downside that you need unsafe to get a &Foo or Box<Foo> or what-have-you.

(See also this (closed) RFC.)

Oh, interesting. But I feel like this will be tricky if my struct also contains other fields, e.g.:

pub struct Foo {
    extra: bool,
    inner: str,
}

impl Foo {
    pub fn new(extra: bool, s: &str) -> Box<Self> {
        todo!() // how to do this?
    }
    pub fn show(&self) {
        println!("{}", &self.inner); // this is easy
    }
}

(Playground)

The Nomicon on DSTs says:

Currently the only properly supported way to create a custom DST is by making your type generic and performing an unsizing coercion: […] (Yes, custom DSTs are a largely half-baked feature for now.)

I think with a variable sized *s as in the Playground above, this is impossible to do in Rust. (Or is there a way to implement new?)

Ah, there's a way but I don't have time to rediscover all the details right now... roughly IIRC

  • go generic: Foo<S: ?Sized> { extra: bool, inner: S }
  • make your layout well defined (e.g. repr(C))
  • manually allocate the right amount of space as if you had a Foo<[u8; len]> of approriate length
  • write all the data there, probably through a pointer
  • Use Box::into_raw() or pointer casts or whatever to go from whatever pointer you wrote the data through into Foo<str>

The key part is that the metadata of the DST field, which must be last, is the same as the metadata of the container.

Hmmmm, I see. I feel like this is not the way I want to go. :sweat_smile: (But I like the idea of custom DSTs.)

1 Like

Rethinking about it, I think "Qual der Wahl" actually translates quite well to "spoiled for choice". I do think it's true that the English phrase is more euphemistic while the German phrase feels more harsh (literally translates to "agony of choice"), but it can refer to equally good or equally bad options.


  1. To have a "Qual der Wahl" is a German phrase. It could be roughly translated to "being spoiled for choice", but the latter is more euphemistic. The German word emphasizes on the disadvantages of each option. ↩︎