Ergonomics of creating `String`s


#1

I’ve recently been working with some rather String heavy APIs and have been starting to question the ergonomics and performance of the current conventions.

Let’s start with the ergonomics. Trying to write idiomatic Rust I currently have function calls like this:

xml::Element::new("iq".to_string(), Some("jabber:client".to_string()),
                  vec![("id".to_string(), None, "bind".to_string()),
                       ("type".to_string(), None, "set".to_string())])

The to_string calls create a lot of noise. With more elements in the Vec it would be tempting to write this with slices, and then map().collect() over it.

The second aspect I’m a bit concerned about is performance. It might not be obvious, but calling to_string() on a &str currently invokes the whole formatting machinery, generating a lot of code. In the current state of the language this can’t really change, since ToString is implemented for any type implementing fmt::Display, which makes impl<'a> ToString for &'a str a conflicting impl.

There are multiple alternative ways to create Strings from string slices, among them From::from(), String::from_str(), and ToOwned::to_owned(). The later two solve the performance Issue, none of them improve the ergonomics.

In theory I could make the xml::Element::new() method generic over ToString. However, that still wouldn’t enable passing a mixture of string slices and Strings as part of the Vec, and it would require invoking the formatting machinery even for String.

I was wondering how other people are dealing with this. Are there possibly plans to improve this situation in the future, that I’m not aware of? A str!() macro has occasionally been suggested (also for “consistency” with vec![]), but was usually dismissed.


#2

i mean, you could do the following:

macro_rules! s {
    ($s:expr) => {
        $s.to_string()
    };
}
fn main() {
    let v = vec![s!("hello"), s!("world")];
    println!("{:?}", v);
}

#3

It’s fine to do it as a function:

use std::borrow::ToOwned;
fn s(string: &str) -> String {
    string.to_owned()
}

It’s better to use to_owned instead of to_string, because to_string goes through the formatting hierarchy. (Any you actually save one character (the !) at the call site this way.)


#4

One of the ways to solve this would be to APIs which store String Cows, or even just &str where possible. Another way would be to just have a ton of generic parameters so each thing is a ToOwned.

If you do something like this, it would be totally possible to pass any number of different things which implement ToOwned, although it is a bit verbose.

fn new<A, B, C>(some_str1: A, some_str2: Option<B>, some_list: Vec<C>) where
        A: ToOwned<Owned=String>,
        B: ToOwned<Owned=String>,
        C: ToOwned<Owned=String>,
        String: Borrow<A> + Borrow<B> + Borrow<C> {
    ...
}

Once we have a stable version of IntoCow<str, 'static>, switching to just storing Cow<str, 'static> is also a very viable option which also gets rid of any performance detriments with converting &'static str to String, while still providing an easy way to have a generic function (you can just use IntoCow instead of ToOwned).


#5

to_string being not only verbose but also slow is a bit disappointing.


#6

I don’t consider a single letter function or macro a solution to this. Crates rolling their own idioms for basic functionality effectively creates multiple dialects of the language. Making it a single letter function/macro decreases readability even further for everyone but the original author.
IMHO this needs a general solution in the standard library.


#7

ToOwned is not really viable, since String does not impl ToOwned. Even if it did, it would have to implement Clone-like semantics.
IntoCow would be feasible, but still has the limitation that a Vec<C> can not contain a mixture of types. One could pass Cow<'static, str>s, but that adds back a lot of boilerplate.


#8

What is your use case for vectors of mixed strings and slices? It’s not clear from the example, why the function doesn’t take slices instead of both vectors and strings.


#9

You could accept an iterator, with an Into<String> bound on the iterator item. It’s what I do in Docopt for accepting an argv: http://burntsushi.net/rustdoc/docopt/struct.Docopt.html#method.argv


#10

Why do you think that would help? The iterator still has to iterate over either &str, or String. In fact currently &str is the only type that actually impls Into<String> if I’m not mistaken.
I do however like the idea of using an iterator instead of Vec. Unfortunately in my case I don’t have plain strings, but (String, Option<String>, String) tuple, which makes this considerably harder.
I also had not realized that the way Into works means you can use takes_string("asdf".into()) instead of calling to_string() on the parameter. Unfortunately the implementation internally still calls to_string() instead of to_owned() :frowning:.

EDIT: There is in fact a From<T> for T impl, hence a Into<T> for T impl, and therefore a Into<String> for String impl. So my statements about only &str implementing Into<String> above is misguided.


#11

Ah, yeah, I guess I missed that about ToOwned.

As of now though (I didn’t realize this before), Into<Cow> does actually work.

fn new<A, B, I1, I2, I3, C>(some_str1: A, some_str2: Option<B>, vec_things: C) where
        A: Into<Cow<'static, str>>,
        B: Into<Cow<'static, str>>,
        I1: Into<Cow<'static, str>>,
        I2: Into<Cow<'static, str>>,
        I3: Into<Cow<'static, str>>,
        C: Iterator<Item=(I1, Option<I2>, I3)> {
    let string1 = some_str1.into().into_owned();
    let string2 = some_str2.map(|x| x.into().into_owned());
    let things = vec_things.map(|(t1, t2, t3)| (
        t1.into().into_owned(),
        t2.map(|x| x.into().into_owned()),
        t3.into().into_owned(),
    )).collect::<Vec<(String, Option<String>, String)>>();
}

new("iq", Some("jabber:client"), vec![("id", None, "bind"),
                       ("type", Some("Something"), "set")].into_iter());

(playpen: http://is.gd/d3iwd4)

For vectors though, it’s true you would need to have them be one type, though I don’t think it’s that much of a restriction to have to have all of the strings either String or &str.


#12

This is more easily achieved using Into<String> though.
I have to admit I’m a bit conflicted about this. You are basically arguing that any function currently taking a String should just take a T: Into<String> so the call can be more ergonomic. I.e. this feels like fixing the problem on the wrong end.


#13

Why were you taking String in the first place, ending up fighting your own library?


#14

No. String impls Into<String> too. (Oh, I see your edit now.)

As a general rule, if your function accepts a String, then it’s probably a good ergonomics improvement to replace it with Into<String>. (This is what I’ve done in most of my libraries. When I can get away with it, I will try to use Into<Cow<'a, str>> though.)


#15

That is certainly an “interesting” point of view. As far as I’m concerned, I’m not fighting my library, but the fact that Strings are verbose to create.
On a more general note: This has (AFAIK) for a long time been the preferred style when ownership is required (cf. the Guidelines). It gives the caller the ability to reuse preexisting Strings (the library this is a part of actually makes use of this fact), and makes the cost of allocating a String from a &str obvious at the call site.
Particularly the later point is why it surprises me that people are now suggesting to take Into<String>. I have previously suggested this as a convention (though back then you would have taken Str and called into_string() on it), but it was generally though to be non-rustic because it hides the allocation cost from the user.

As an aside: This API originally took &strs and a &[&str] which made it much nicer to use for some cases. I specifically changed this before making the first minor release.


#16

I think the portion of this thread regarding ergonomics and verbosity of Strings syntax is somehow related to the following general discussion:
http://users.rust-lang.org/t/why-strings-and-vectors-are-treated-differently-syntactically/786
I opened that thread when rust was still in alpha, then it turned into beta so probably it is too late for these kind of discussions?..


#17

You could copy ruby and just add .to_s() and it would help a slight bit, right? Or C++ and have a user-defined literal "hello"_s.


#18

[quote=“Florob, post:15, topic:850”]
It gives the caller the ability to reuse preexisting Strings
[/quote]If I absolutely wanted this I’d go with Cow/IntoCow. Considering your example, it’s not clear said ability outweighs the difficulty of using string literals in this case. And the same module has some implicit allocations anyway.

You have a point about explicit allocations and I have to admit to not paying a lot of attention to those guidelines :wink: I guess you’re paying for that explicitness with ergonomics. BTW I remember strcat claiming that small allocations are very cheap.

[quote=“Florob, post:15, topic:850”]
This API originally took &strs and a &[&str]
[/quote]Well, I don’t see the point of taking a Vec just to throw it away immediately.


#19

You’re comparing apples and oranges. If the function accepts a parameter bounded by Into<String>, then that fact alone makes the allocation very clear and obvious.

The old Str bound, on the other hand, would hide the allocation. The old Str trait provided as_slice(), which means the function body could use a borrowed string or an owned string. If the function used an owned string, then the allocation is hidden from the caller. In the case of Into<String>, the only way the function body can use the parameter is by calling into. If the caller passes a &str, then an allocation occurs and it is not hidden from the caller (by definition of Into).


#20

Yes. I don’t see how small syntactic changes of questionable benefit that cause major breaking changes will be accepted now.