A flexible type for `Vec<String>`

How would you express a flexible type which internally used as Vec<String>? Here's what I came up with:

fn into_vec_str(items: impl IntoIterator<Item = impl Into<String>>) -> Vec<String> {
    items.into_iter().map(Into::into).collect()
}

fn main() {
    let result = vec!["foo", "bar"];

    assert_eq!(into_vec_str(["foo", "bar"]), result);
    assert_eq!(into_vec_str(["foo".to_string(), "bar".to_string()]), result);
    assert_eq!(into_vec_str(vec!["foo", "bar"]), result);
    assert_eq!(
        into_vec_str(vec!["foo".to_string(), "bar".to_string()]),
        result
    );
}

I have used the IntoIterator also (here) to make a function accept a "list" of values, so I can either pass vec![x, y, z] or [x, y, z], etc.

I guess Into<String> can be used to denote any type that is convertible into a String (and might be the right choice here), but there is also ToString which has slightly different semantics, I believe.

See also Is there actually a semantic difference between FromStr and TryFrom<&str>? for a similar topic.


P.S.: Note that impl IntoIterator<Item = impl Into<String>> in argument position isn't really a type but a short syntax for making the function generic.

1 Like

Thanks :slight_smile: The choice of Into<String> is intentional. Essentially I want to cover only String and &str, not everything displayable.

Note that you might want to add extra bounds, such as I added ExactSizeIterator, which allows implementations to know the length without having to iterate through all values first.

Even if ExactSizeIterator can't be trusted in unsafe code, it still may help implementations to allocate memory more efficiently, e.g. by internally using

let some_vec = Vec::with_capacity(items.len())

which doesn't work if you omit the ExactSizeIterator trait bound.

Also note that using IntoIterator takes away some more abilities from the caller callee. In particular: you can't use all methods that are supported when working with slices. Also note that using generics may either lead to monomorphization overhead in regard to the size of the compiled binary or require an extra indirection (by making an inner function).

1 Like

The code you came up with looks very reasonable to me. IntoIterator bounds allow for generally very flexible usage. One example you didn't show was the ability to also accept references such as &Vec<String> or &[String] with this bound. Unfortunately, &[&str] would not work due to a lack of String: From<&&str> implementations.[1]

It even optimizes nicely in certain use-cases; e.g. if you give it a Vec<String>, I wouldn't be surprised if the specialized collect implementations in the standard library together with LLVM manage to optimize the whole into_vec_str conversion into a no-op.


  1. This would be an advantage of FromStr; but if you want to be more restrictive than that, and the &&str use-cases come up a lot (and the workaround of doing .iter().copied() on the caller-site is too tedious), there’s always also the alternative to define a custom trait, which gives even more control what exact types are allowed; e.g. Into<String> is also something that char implements, which may or may not be a type you want to support. ↩︎

1 Like

Perhaps you mean abilities of the callee?

If it’s just about being more efficient, it might be enough to work with the Iterator::size_hint information, which is something that e.g. .collect::<Vec<_>>() will do to avoid unnecessary re-allocations.

2 Likes

Yes, thanks for correcting me.

Oh, I see! In my case I really need to know the length in advance (due to implementation here, which, by the way, seems to have a soundness issue[1]). But for allocation, size_hint may be sufficient.

Either way, collect is a method of Iterator, so it could even be a specialized implementation for a particular type. But I guess the default implementation then uses size_hint?


  1. because it doesn't check whether remaining_argc is zero after the for arg in args loop ↩︎

Yeah, it does… it’s a bit tedious to actually find the source for it, because there’s an impressive number of layers of specialization layered on top of it :sweat_smile:, including – among other things – the one I hinted at that allows .collect to re-use a Vec’s allocation that was used to create the iterator in the first place that you’re collecting. But yes, (the lower-bound of) size_hint is indeed used for it.

1 Like

I tried to find it, but gave up fast. :sweat_smile:

I've always considered impl Into<String> parameters a code smell, personally. After all, it doesn't do the "only String and &str" that you mentioned -- it also accepts char, for example.

It might help if you said more about why you want to do this. What are you planning on doing with that Vec<String>, say?

4 Likes

I feel like using generics to perform type-conversions is a bad idea overall, but I'm not totally sure. I feel like (most of the times) it's best to do it on the caller side, for clarity (and also due to some implementation issues like possible monomorphization overhead, clunky syntax in the function definition, etc). But not sure.

Note that I feel strongest about impl Into<String> in particular. Other things might be more reasonable, especially if it's impl CustomTrait, and there's something to be said for impl Into<Cow<'_, str>> sometimes, etc.

But yeah, the "I'm making compilation slower and my binary bigger just to avoid writing a & in the caller" usually seems like the wrong tradeoff to me.

3 Likes

Are you suggesting using &str as a parameter even when an owned String is needed eventually? Wouldn't it mean unnecessary allocation in case we already have an owned String in the caller?

No. I'm saying that if you need a String, take a String, and let the caller call .to_owned().

Don't take an Into<String> and hide it.

1 Like

Thank you, that makes total sense in case it's just impl Into<String>. But what's your take on more complicated types like Vec<String> I started from? Would you still suggest moving my_str_slice.iter().map(Into::into).collect() into caller? Even for libs?

It might help if you said more about why you want to do this. What are you planning on doing with that Vec, say?

Sorry, I overlooked the question somehow. I just need this Vec<String> to pass to an external lib, in particular aws_sdk_ec2::operation::create_tags::builders::CreateTagsFluentBuilder::set_resources. They obviously decided on moving type conversion to the caller side :wink:

All but the collect, and make it a fn(impl Iterator<String>, ...).

1 Like

If that's allocating a new String for every single thing in the iterator, probably yes. That's quite an expensive thing to hide in a function call.

So I'd consider two things:

  1. If I really do always need owned Strings, I'd take impl IntoIterator<Item = String>. Then it's up to the caller to make the strings -- and see the cost of that -- but it's also trivial to pass a Vec<String> if they happen to have one already. (And I can actually collect that back to a Vec<String> in O(1) today, thanks to some non-guaranteed specializations.)
  2. If I don't actually need ownership, then I'd take something like impl IntoIterator<Item = impl AsRef<str>> instead. That way people can pass &[&str] or &[String] or many other such things, but I'm not doing any copying of it, so the flexibility isn't hiding a bunch of cost.
5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.