Choosing a function design


#1

A very short talk from the CppCon2017, about how to design the API of a sufficiently simple function:

He shows funcions to be used like:

std::string s = " this is a test ";
s = clean_whitespace(std::move(s));

What are the good ways to desing this API in Rust? The simplest is:

fn clean_whitespace(s: &str) -> String {…}

A less natural but more efficient function:

fn clean_whitespace(s: &mut String) {…}


#2

I didn’t watch the talk.

A direct translation is

fn clean_whitespace(s: String) -> String

I would write

fn clean_whitespace(s: &mut String)

personally. Maybe there’s details I’m missing in the talk.


#3

That’s nice. The speaker says that C++ code with std::move works in-place and doesn’t perform memory allocations. The Rust version could do the same, re-using the given moved String.


#4

Yes, both signatures I gave you will work in-place and not perform allocations, one by move (which is why it’s the direct translation) and one by mutable reference.


#5

My first thought after watching was to say similar to the C++ answer. BUT

Clearly you need to be writing three functions; clean_whitespace clean_whitespace_ref clean_whitespace_mut.


#6

Where does the moved value live in memory? The stack of the caller?


#7

Moved values live on the call-ee’s stack frame. Values are moved by byte-copying the stack part. There’s no move constructors.

In the case of String, the pointer, length, and capacity fields get copied, while the heap part (which contains the actual string data) is simply left alone. The same analysis that prevents you from calling a method on a moved value is also used by the compiler to avoid calling a destructor on a moved value, so the destructor is called by the call-ee, not the call-er.


To give you even more unbidden-for information, rustc classifies variables into “definitely moved”, “possibly moved”, “definitely not moved”.

If it’s definitely not moved, the destructor call gets inserted.

If it’s definitely moved, the destructor call doesn’t get inserted.

If it’s possibly moved, like with this code:

fn might_move_the_string() {
    let s = String::new();
    if flip_coin() { clean_whitespace(s); }
}

into something like this:

fn might_move_the_string_compiled_pseudo_rust() {
    let s = String::new();
    let mut s_drop_flag = true; // this variable is magically inserted by the compiler
    if flip_coin() { s_drop_flag = false; clean_whitespace(s); }
    if s_drop_flag { drop(s) }
}

#8

Nit: there’s capacity too. String is a Vec<u8> internally.


#9

Here’s another option, inspired by String::from_utf8_lossy:

fn clean_whitespace<'a, T>(x: T) -> Cow<'a, str>
    where T: Into<Cow<'a, str>>

Mostly-working playground demo: https://play.rust-lang.org/?gist=9a0c1d8ace96a86787e8fa1e82046872&version=stable

That way it takes String, str, or Cow and only allocates if the input is unclean.