Making a data structure generic is causing much pain

I have a library which operates over URIs (or IRIs which is the
same thing). Because of the nature of the library, I decided I needed
to an IRI that was easily clonable, so I use a Rc like so:

pub struct IRI(Rc<str)

This is fine but, of course, my library is now restricted to being
single threaded. So, I decided to update the core data structure to this:

pub struct IRI<A: Borrow<str>>(A);

Now I can have IRI<Rc<str>>, and IRI<Arc<str>>. Or even
IRI<'str> which is quite convienient for testing.

However, this is already a bit of a pain because I have quite a few
methods which take an IRI as a parameter or alternatively something
that contains an IRI. So I have had to add <A: Borrow<str>> to an
awful lot of function signatures.

The problem is that while <A: Borrow<str>> represents 90% of what I
need from an IRI I have also used a variety of other features of
Rc<str> -- for example, at one point in my library, I store them in
a BTreeMap. So, now I have Ord. Another point in my library
requires that A: Debug for reasons that I cannot work out at the
moment. So, the trait bounds increase in size. It's all getting a bit
out of control. Ironically, of course, Borrow<str> can trivially
implement Ord, Debug and all the rest by simple virtue of
Borrowing the str.

I am debating reverting the whole thing and using a conditional
compile feature, which would chose either an Rc or an Arc. Much
less adaptable, but could probably be achieved with 10-20 lines of
changes to my code rather 1000 lines of changes I have so far (and I
haven't finished yet).

Am I missing something here that could make my life easier?

2 Likes

The first thing that comes to mind is that you are likely adding a lot of unnecessary trait bounds. For one, the bound on the type definition itself is not necessary – what's more, it's considered a downright anti-pattern. Relatedly, you should only add trait bounds on impls where you actually use them. For instance, functions that don't rely on Ord but only on Borrow<str> should only include the latter bound, and not the former.

3 Likes

Have you considered just using Arc<str> unconditionally? Measure the performance in typical uses before assuming it's worth keeping the option of using Rc.

6 Likes

A third option is to just make a new trait that encapsulates everything you need:

// Supertrait bounds on everything we need
trait IriAble: Borrow<str> + Ord + Debug + Clone {}

// If you meet the bounds, we implement the trait
impl<T: ?Sized> IriAble for T
where
    T: Borrow<str> + Ord + Debug + Clone
{}

Then you just have a single bound, and all the supertrait bounds are implied. Note that this can be a poor approach for a public trait, if you're still changing things as you describe, as changing the supertrait bounds (adding or removing) is a breaking change.

(Removing may be ok if you seal the trait.)

4 Likes

Thanks for your comments. In answer to the questions, @H2CO3, indeed I had worked out that it's an anti-pattern, as I guess my question shows. I will try leaving off the trait bounds on the data structure, as it might simplify things a lot.

@kpreid that's a good point. My initial use case was actually single threaded, and I was interested in this being fast (there are already libraries in other languages that do the same thing so fast and low memory is the main motivation for this library); but I don't actually know.

@quinedot that might work quite well, as the trait bounds seem to vary mostly between different modules of my library, so I could use different traits for different modules. I'm not worried so much about breaking changes -- I would consider this to be scientific software and we like to break things all the time.

Note this "anti-pattern" will one day become a sort of pattern. There is an approved RFC to make trait bounds in structs become implied wherever the type is used.

This RFC was five years ago, and hasn't received an implementation yet due to waiting on integration of chalk, but the lang team reported recently that there was interest in seeing it implemented.

2 Likes

But that's not the "anti-pattern" part. The anti-pattern part is putting the bounds on the type definition in the first place (ie., it's not about having to repeat the bounds in the impl).

The problem with that practice is that it will result in superfluous trait requirements even when the type could do without them just fine. For example, a newtype wrapper might require Ord or Hash in order to be ordered or hashable. However, it doesn't need these traits for construction, so uncondtionally putting them on the generic type parameter could eg. impede the use of Wrapper::new(), causing needless churn.

I really hope that's not going to become a pattern, ever.

5 Likes

@H2CO3 I am curious, though. What are the practical consequences of "superfluous trait requirements"?

I mean, given that my library is currently non-generically coded to Rc<str> if I replaced all instances of Rc<str> with a generic bound to Borrow<str> plus pretty much all of the traits that Rc<str> implements (so Clone+Debug +Eq + Hash + Ord + PartialEq + PartialOrd plus a few others, what would the problem be? I would still have something that was much more generic and, in practice, most uses of the library would require that any generic type implements all of these, because you are going to hit a method that uses them sooner or later.

I will play devil's advocate and support @H2CO3 's argument by showing how the type might not even be created. Suppose that some downstream code wants the ability to optionally construct an IRI:

use iri::Iri;

pub enum MyWrapper<X> {
    Iri(Iri<X>),
    SomethingElse(X),
}

Here, if Iri has a built-in requirement that A: Borrow<str>, then it prevents this downstream code author from using MyWrapper<&[u8]> even though MyWrapper::<&[u8]>::SomethingElse ought to be able to exist. To work around this, the downstream code author will need to redesign MyWrapper in terms of a trait (and possibly some dynamic dispatch).

In retrospect, due to the implications that these bounds have on downstream code, I think that after Implied Bounds is implemented, the general recommendation would likely still be to not use them unless necessary. (they are necessary for things like core::iter::Peekable and core::borrow::Cow which need to refer to associated types like Self::Item and Self::Owned)


(of course, there is also the more commonly brought up point that, in some alternate universe where e.g. HashMap had bounds on the struct, then HashMap::new would require Hash and Eq when they currently do not. I choose not to focus on this argument because I don't find it nearly as compelling; Putting the map into an Option should serve the vast majority of use cases, except maybe some niche cases that involve codegen or backwards compatibility with pub fields)

6 Likes

I can certainly see this.

My experience, though, is that putting adding Borrow<str> to individual methods has not worked, because even if I put this bound on a function where it is directly used the requirement soon percolated through my entirely library. I am now trying something like the IriAble supertrait suggested above. That probably means I have trait bounds which are too wide, but the process of adding this to all my functions is at least manageable and has to be done only once. With individual traits, I'd add some trait bounds based on compiler errors, then find I had some more errors, then go around the same methods and have to add further bounds.

In the end, my question would be what is the value of MyWrapper? If it includes my IRI type, then presumably who ever wrote it is going to want to use my library. And as Borrow<str> is likely to end up being a trait bound on every function by the time I am finished, where are you going to use an IRI extracted from MyWrapper that does not implement Borrow<str>.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.