Connecting (joining) string slices without a temp Vec


#1

I happened upon what should’ve been a straightforward problem, described to me by a friend in C++, and tried to implement it in Rust.

The problem: given a set of pairs of strings, like ("one", "one"), ("two", "two") join each pair over a : and then join them all together over a , so the result is: "one:one,two:two"

Joining pairs was indeed straightforward:

let strings = set.iter().map(|&(a, b)| format!("{}:{}", a, b))

But connecting the iterator into a single string proved cumbersome:

let strings = set.iter().map(|&(a, b)| format!("{}:{}", a, b));
let result = strings.collect::<Vec<_>>().connect(",");

Having to build a real concrete vector in memory just to, basically, iterate over it again seems suboptimal. Yet it’s the only way I could find in a standard library. Did I miss another?

My second attempt was to use itertools:

let result: String = strings.intersperse(",".to_string()).collect();

Even without dwelling on the ugly but necessary .to_string() call this didn’t work because a String knows how to construct itself from an iterator of &str but doesn’t know how to construct itself from other Strings. Is that a simple omission in the standard library or is it me missing some work around? Because this didn’t work either:

let result: String = strings.intersperse(",".to_string()).map(|s|. s.to_str()).collect();

… as s only lives within the closure and can’t be to_str()ed out of it.

Finally I’ve managed to do this by manually using fold():

let strings = set.iter().map(|&(a, b)| format!("{}:{}", a, b));
let result = strings.intersperse(",".to_string()).fold(String::new(), |res, s| res + &s);

But that looks way more involving than it should… Any advice? Thanks!


#2

You could just add the comma normally:

fn main() {
    let set = [("one", "one"),
               ("two", "two"),
               ("three", "three"),
               ("four", "four")];

    let mut strings = set.iter()
                         .fold(String::new(), |acc, &(l, r)| {
                            acc + l + ":" + r + ","
                        });
    strings.pop();

    println!("{}", strings);
}

Playpen

Note: I have no idea if it’s fast but it seems reasonable to me. Just keep pushing onto the end of the string.


#3

Thanks! This works however only for joining over one characters which is generally not the case. In fact, for display purposes you’d probably want to use ", ". This is why having .connect() in the standard library is very useful were it not constrained specifically to vectors. Also, I used format! because it is, again, a more general form than simple concatenation. Imagine pairs being (label, quantity) and you’d want to print them out like this:

label          100
longer label    10

So the question is more general: how to connect instances of String (which is what we get from format!) over a given str.


#4

I’d just do this:

let strings = set.iter()
                 .map(|&(a, b)| format!("{}:{}", a, b))
                 .intersperse(",".to_string())
                 .collect::<Vec<_>>()
                 .concat();

Other more experienced people may have better ideas though.


#5

For the record, if your aim is efficiency, you probably want to avoid the string allocation in format! as well. I don’t know any particularly pretty way to do this, but there’s always:

let mut s = String::new();
for &(a, b) in v { write!(s, "{}:{},", a, b).unwrap(); }
s.pop();

…actually, I just tried compiling that, and even with -C lto -C opt-level=3 it doesn’t produce remotely nice code - the format doesn’t manage to get transformed into a series of writes. For true efficiency you’d need even more verbosity to do that manually. ¯\_(ツ)_/¯


#6

This is one way to do it relatively efficiently. The join function in my example is just a stand in for what Itertools::join already does:

(playpen link)

use std::fmt::{self, Display, Write};

/// Concatenate display of the contents
struct Concat<T>(pub T);

impl<A, B, C> Display for Concat<(A, B, C)>
    where A: Display, B: Display, C: Display,
{
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let t = &self.0;
        write!(f, "{}{}{}", t.0, t.1, t.2)
    }
}

fn join<T>(sep: &str, iter: T) -> String
    where T: IntoIterator,
          T::Item: Display,
{
    let mut out = String::new();
    let mut iter = iter.into_iter();
    if let Some(fst) = iter.next() {
        write!(&mut out, "{}", fst).unwrap();
        for elt in iter {
            write!(&mut out, "{}{}", sep, elt).unwrap();
        }
    }
    out
}

fn main() {
    let set = [("one", "one"),
               ("two", "two"),
               ("three", "three"),
               ("four", "four")];
    
    let out = join(",", set.iter().map(|&(l, r)| Concat((l, ':', r))));
    println!("{:?}", out);
}

#8

I’ve no idea how I missed itertools::join! That’s exactly what I was looking for, thanks a lot!

Thanks for a very good example, too. A few questions:

  • Do I understand it right that Concat's only purpose is to be able to attach a trait to a tuple?

  • Why join wants an IntoIter instead of an Iterator? I’ve seen it done somewhere else too but I don’t understand why it is done like this?


#9

Concat’s purpose is to have a structure that expresses the concatenation of Display representation yes, that’s what it does. (Unfortunately in rust we need to use macros to implement it for multiple tuple sizes basically).

The nice thing for both Concat and join is that both &T and T display the same way, so we don’t have to wrangle types as much.

IntoIterator is there to simplify passing “iterable” values. It means you can pass both iterators and values convertible to iterators. So join(&[1,2,3], ", ") and join([1,2,3].iter(), ", ") are the same thing. I’ve gotten into a habit of using it wherever you expect an iterator.

Itertools::join however does not use IntoIterator, it’s instead a method — on just iterators. So there you need an explicit conversion to iterator, in return you have convenient method chaining syntax.