String vs str, (String, T) vs?

Hi! I'm trying to make a little Version(String, usize) struct for versions in an API I'm writing.

The struct will be used a lot I don't want to pay the cost of String all over the place. Most of my functions (almost all of them) would be very happy accepting a Version(&str, usize) instead.

It seems like I have 3 options API-wise:

  1. Make my struct be Version<'a>(&'a str, usize)
  2. Make my struct be Version(String, usize) and pass around &Version references
  3. Make two structs, Version and VersionRef, with traits to convert both ways.

Option 2 seems bad because it forces all consumers of my API to pack their data in a String - which is undesirable.

Option 3 seems ok, but what traits should I implement for my types? From<Version> for VersionRef makes sense. Should I implement Borrow and AsRef? I can't get this to typecheck:

impl AsRef<VersionRef<'a>> for Version {
    fn as_ref(&self) -> &'a VersionRef {
        &VersionRef(self.0.as_str(), self.1)
    }
}

Or do I just implement From both ways?

Is option 1 the most idiomatic? I feel like I'll still want an owned variant in certain circumstances.

Help!

You can use Cow<'a, str> to store either an &'a str or a String in the same type. Whether or not that will be more convenient than two separate types will probably depend on the specifics of your API.

1 Like

I am quite likely to go with the Version + VersionRef solution. You can't implement AsRef, but you can define a non-trait method for the conversion.

2 Likes

How dynamic are these version strings? Would (&'static str, usize) be sufficient?

I wonder if this is a case where deref_owned might be useful (which I created). You could make the following:

use deref_owned::{GenericCow, Owned};
use std::fmt::Debug;

#[derive(Debug)]
struct Version<S>(S, usize);

fn uses_version<S>(version: Version<S>)
where
    S: GenericCow<Borrowed = str> + Debug,
{
    println!("Got: {version:?}");
}

fn main() {
    uses_version(Version(Owned("Apple".to_string()), 1));
    uses_version(Version("Banana", 2));
}

Output:

Got: Version("Apple", 1)
Got: Version("Banana", 2)

However, I don't think GenericCow is idiomatic at all (at least not as of today). It's more a thought-experiment currently.

In many cases, you might just use a Cow, which is also syntactically easier to handle.

Cow is an interesting idea, but it feels weird moving the question of ownership from a compile-time question to a runtime question. The strings here are IDs, which are never mutated at runtime anyway.

@quinedot The strings are immutable after creation. I always think of 'static as meaning "Lives for the entire lifetime of the program" - which implies you can't load them dynamically from JSON or over the network or something. (This is the main use I have for an owned variant). Am I thinking about 'static wrong?

@jbe Cool library - but I can't figure out how to implement Borrow or AsRef or any of the other traits for my type in this case.

Its an interesting idea just doing this:

struct Version<S: AsRef<str>>(S, usize);

.. Which would allow any of these variants to work. Its a bit more cumbersome to type though, since those generic parameters will spill all over the place.

If you see 'static bound, like

fn foo<T: Trait + 'static>(t: T) { /* ... */ }

That means that only types which are valid everywhere (T: 'static) can meet the bound, but that doesn't mean they have to be alive for the entirety of the program. String: 'static for example, but you can obviously create one and then destroy it.

In contrast, if you see a 'static reference like &'static str, that means that the value it refers to must stick around for 'static -- forever -- which matches your intuition. If they're not string literals ("..."), you have to leak the data to create a &'static str -- something you usually don't want to do.

In summary, it sounds like &'static str is not sufficient.


This is may be over-engineering, but you could have

struct VersionData<D>(D, usize);
type Version = VersionData<String>;
type VersionRef<'a> = VersionData<&'a str>;

impl<D: AsRef<str>> VersionData<D> {
    fn as_version_ref(&self) -> VersionRef<'_> {
        let data = self.0.as_ref();
        VersionData(data, self.1)
    }
}

fn needs_owned(version: Version) { /* ... */ }

fn doesnt_care<D: AsRef<str>>(version: VersionData<D>) {
    let version = version.as_version_ref();
    // ...
}
2 Likes

I tried to adopt @quinedot's idea to deref_owned:

use deref_owned::{GenericCow, Owned};

#[derive(Debug)]
struct Version<S>(S, usize);

impl<S> Version<S> {
    fn version_borrow<B>(&self) -> Version<&B>
    where
        S: GenericCow<Borrowed = B>,
        B: ?Sized,
    {
        let Version(s, i) = self;
        Version(s.borrow(), *i)
    }
}

fn uses_version<S>(version: Version<S>)
where
    S: GenericCow<Borrowed = str>,
{
    let version: Version<&str> = version.version_borrow();
    println!("{version:?}");
}

fn main() {
    uses_version(Version(Owned("Apple".to_string()), 1));
    uses_version(Version("Banana", 2));
}

Yeah, I like that, but I would probably use Borrow instead of AsRef.

Yes, that's why I think many people could consider Cow to be more idiomatic, even if it comes with runtime overhead (at least that's feedback I often got in regard to deref_owned).


A note regarding deref_owned: There has been a long discussion about its (un)usefulness and whether the GenericCow trait is semantically making sense or badly defined. I think @CAD97 and/or others criticized the Deref supertrait, but I'm not sure if I recall this correctly.

See the long thread Smart pointer which owns its target for some of the critique (which mostly refers to older versions of the trait and wrapper, though).

I'm still not sure if making Deref a supertrait of GenericCow was a good idea. I explained my choice here: Design considerations. Maybe I should reconsider it. But if GenericCow is just an abstraction over &, Cow, and Owned, then it might be okay keeping it.

1 Like

I started a proof-of-concept for this, and judicious use of traits, parameter defaults, and type aliases can ease the pain quite a bit. In some ways, it turns into option 3 for user code but without needing to duplicate every implementation twice:

type VersionRef<'a> = Version<&'a str>;

#[derive(Clone,Debug)]
pub struct Version<S=String>(pub S, pub usize);

impl<'a> Copy for VersionRef<'a> {}

pub trait AsVersionRef {
    fn as_version_ref(&self)->VersionRef;
}

impl<S:AsRef<str>> Version<S> {
    pub fn to_owned(&self)->Version<String> {
        Version(String::from(self.0.as_ref()), self.1)
    }
}

impl<S:AsRef<str>> AsVersionRef for Version<S> {
    fn as_version_ref(&self)->VersionRef<'_> {
        Version(self.0.as_ref(), self.1)
    }
}

impl<R, S> PartialEq<R> for Version<S>
where R: AsVersionRef, Version<S>: AsVersionRef {
    fn eq(&self, other: &R) -> bool {
        let l = self.as_version_ref();
        let r = other.as_version_ref();
        (l.0 == r.0) && (l.1 == r.1)
    }
}

fn main() {
    let v1 = Version("foo", 3);
    let v2 = Version(String::from("foo"), 3);
    
    assert_eq!(v1,v2);
}
1 Like

What I forgot is that this performs an unnecessary clone where you pass an owned version and need an owned version, or am I wrong? I think this was my original motivation to make deref_owned (and what Cow is about), to avoid such unnecessary clones.

I came up with the following approach. Its unsafe, but I still think its interesting because it is as powerful as it possibly can be - it even works with HashMap. Unfortunately, I don't see any way to get an API as nice as this one without unsafe.

use std::ops::Deref;
use std::borrow::{ToOwned, Borrow};
use std::marker::PhantomData;
use std::ptr::NonNull;
use std::fmt;

#[derive(Copy, Clone)]
struct VersionRefData {
    ptr: NonNull<u8>,
    len: usize,
    extra: usize,
}

#[repr(transparent)]
pub struct VersionRef {
    data: VersionRefData,
}

#[derive(Copy, Clone)]
pub struct VersionSlice<'a> {
    data: VersionRefData,
    lifetime: PhantomData<&'a str>,
}

pub struct Version {
    data: VersionRefData,
    capacity: usize,
}

impl Version {
    pub fn new(data: String, extra: usize) -> Self {
        let mut data: Vec<u8> = data.into();
        let len = data.len();
        let capacity = data.capacity();
        let ptr = data.as_mut_ptr();
        std::mem::forget(data);
        
        Self {
            data: VersionRefData {
                ptr: unsafe { NonNull::new_unchecked(ptr) },
                len,
                extra,
            },
            capacity,
        }
    }

    unsafe fn make_string(&self) -> String {
        String::from_raw_parts(
            self.data.ptr.as_ptr(),
            self.data.len,
            self.capacity,
        )
    }

    pub fn into_string(self) -> String {
        unsafe {
            let s = self.make_string();
            std::mem::forget(self);
            s
        }
    }

    pub fn as_slice(&self) -> VersionSlice<'_> {
        VersionSlice::new(self.as_str(), self.data.extra)
    }
}

impl<'a> VersionSlice<'a> {
    pub fn new(data: &'a str, extra: usize) -> Self {
        Self {
            data: VersionRefData {
                ptr: unsafe { NonNull::new_unchecked(data.as_ptr() as *mut u8) },
                len: data.len(),
                extra,
            },
            lifetime: PhantomData,
        }
    }

    pub fn as_str(&self) -> &'a str {
        unsafe {
            let slice = std::slice::from_raw_parts(self.data.ptr.as_ptr(), self.data.len);
            std::str::from_utf8_unchecked(slice)
        }
    }
}

impl VersionRef {
    fn create(data: &VersionRefData) -> &VersionRef {
        unsafe {
            &*(data as *const VersionRefData as *const VersionRef)
        }
    }

    pub fn as_str(&self) -> &str {
        unsafe {
            let slice = std::slice::from_raw_parts(self.data.ptr.as_ptr(), self.data.len);
            std::str::from_utf8_unchecked(slice)
        }
    }

    pub fn extra(&self) -> usize {
        self.data.extra
    }
    
    pub fn to_version(&self) -> Version {
        Version::new(self.as_str().to_string(), self.extra())
    }
}

impl Clone for Version {
    fn clone(&self) -> Version {
        self.to_version()
    }
}

impl Deref for Version {
    type Target = VersionRef;
    
    fn deref(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl AsRef<VersionRef> for Version {
    fn as_ref(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl Borrow<VersionRef> for Version {
    fn borrow(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl Deref for VersionSlice<'_> {
    type Target = VersionRef;
    
    fn deref(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl AsRef<VersionRef> for VersionSlice<'_> {
    fn as_ref(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl Borrow<VersionRef> for VersionSlice<'_> {
    fn borrow(&self) -> &VersionRef {
        VersionRef::create(&self.data)
    }
}

impl ToOwned for VersionRef {
    type Owned = Version;
    
    fn to_owned(&self) -> Version {
        self.to_version()
    }
}

impl Drop for Version {
    fn drop(&mut self) {
        unsafe {
            drop(self.make_string());
        }
    }
}

impl fmt::Debug for Version {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("Version")
         .field("string", &self.as_str())
         .field("extra", &self.data.extra)
         .finish()
    }
}

impl fmt::Debug for VersionSlice<'_> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("VersionSlice")
         .field("string", &self.as_str())
         .field("extra", &self.data.extra)
         .finish()
    }
}

impl fmt::Debug for VersionRef {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("VersionRef")
         .field("string", &self.as_str())
         .field("extra", &self.data.extra)
         .finish()
    }
}

playground

1 Like

I would just use option 2.

Since the data is read from file, it has to be owned somewhere, so you do need a String. That excludes option 1.

Option 3 makes the API more complex and unidiomatic, and I'm not sure it buys anything in terms of efficiency. Are we going to have Version, VersionRef and &VersionRef? Or is VersionRef going to be Copy?

Thanks for the replies everyone! For this use case I ended up using Version(&str, usize) everywhere (!!). It turns out I can get away with that in this case because, on reflection, I always have an owned copy of the underlying string somewhere:

  • When loading from a file or over the network, I have the raw file / network bytes while loading. Serde can return a borrowed object.
  • Once the data is loaded, the strings are stored internally in my library so I can just borrow them.

I wrote an VersionOwned(String, usize) variant as well, with conversion helpers both ways just for API convenience. But I haven't needed it.

But I really appreciate the discussion! I ended up using a few of these ideas in another part of my codebase with a similar problem (Frontier(Vec<usize>) vs FrontierRef(&[usize])). I ended up using something similar to what @quinedot suggested above.

I was hoping there was a canonical "best practice" way to do this. I really wish the signature for AsRef / Borrow let me convert between types that look like this. It feels like a missed opportunity. I want to do something like what @2e71828 / @jbe suggested above but I don't think I can stomach the complexity that brings to my API.

Yet another approach I don't think anyone has mentioned is this:

type VersionRef<'a> = (&'a str, usize);
type Version = (String, usize);

trait VersionHelpers {
  // Version methods here
  fn name(&self) -> &str;
  fn seq(&self) -> usize;
}

impl<'a> VersionHelpers for VersionRef<'a> { ... }
impl VersionHelpers for Version { ... }

... But it has two pretty big downsides:

  1. Anyone who uses this API will need to use the VersionHelpers trait if they want any of the methods
  2. Functions which take a version will need a template parameter. Oh and because of that the compiler will produce more code due to monomorphization.

I feel like there's a potential for a language feature to express a type with owned + referenced variants in a nice way. But its certainly not the end of the world.

Thanks all!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.