How to newtype string

I need a type (called Tag) that is represented with a string, but the possible values are restricted and there is a bunch of special functions for it, so it makes a lot of sense to create a new type for it. Additionally it makes a lot of sense to have both borrowed and owned instances, because on one hand many instances will be read-only created from references into a larger string (so I can't just use &Tag) while on the other there are functions for composing the value and these need to store the result.

So I can think of four options:

  1. Create two variants, Tag and TagBuf and impl ToOwned for Tag similar to std::path::Path and PathBuf. The downside is that it will require basically duplicating most of the methods plus some boilerplate for the conversion and unsafe std::mem::transmute for casting the &str to &Tag. And then some functions will end up returning Cow<'a, Tag> anyway, because they may need to normalize the content.

  2. Wrap Cow<'a, str> like struct Tag<'a> { inner: Cow<'a, str> }. This would implement into_owned returning Tag<'static>. That looks relatively nice to me, minimizing the duplication at the cost of worse control over memory allocation. I have not seen any precedent for this though, so I am not sure I won't hit a roadblock.

  3. Give up on efficiency and wrap a String.

  4. Give up on new type and create a trait and implement it for str (IIUC Deref should make the methods available on String directly).

Given the last note, Deref should make all the read-only methods of Tag available on TagBuf and Cow<'a, Tag> (that I would need a lot; some operations normalize and would clone if needed) in the first option too, wouldn't it?

So what would be the most Rusty approach?

2 Likes

The unicase crate by @seanmonstar uses a different approach. It creates a generic newtype:

pub struct UniCase<S>(pub S);

It then implements a bunch of traits for UniCase<S> where S: AsRef<str>, which includes both UniCase<String> and UniCase<&str>:

impl<S: AsRef<str>> Ord for UniCase<S> {
    // ...
}

Hm, it makes sense there, but I feel it does not make sense in my case. Because unicase is just wrapping a string to provide alternate semantics for it, but my type is not a string, it is just represented by it. Therefore I think in my case the underlying string type should remain hidden.

I have the type public in unicase because it doesn't have any invariants that I need to enforce besides being utf8 (which str already does for me). So the basic constructor is the best choice in my case: let x = UniCase("foo").

In your case, only allow certain values would mean not making the internal type public, and the constructor returns a Result.

let tag = try!(Tag::new("foo"));

To support any kind of str type internally, you could accept S: AsRef<str>.

pub struct Tag<T>(T);

impl<T: AsRef<str>> Tag<T> {
    fn new(value: T) -> Result<Tag<T>, TagError> {
        if is_valid(&value) {
            Ok(Tag(value)
        }  else {
            Err(TagError::SomethingBad)
        }
    }
}

Well, I additionally have operations that return the owned variant and operations that would best return a Cow, because they need to do normalization, but often the input would be normalized. Which is why I thought about making the type itself a Cow.