Should I use `from_utf8_unchecked` here?

Here's a function that serializes iterator of key-value pairs as a json.

fn key_value_pairs_to_json<'a>(iter: impl Iterator<Item=(&'a str, &'a str)>) -> serde_json::Result<String> {
    let mut buffer = Vec::new();
    let mut serializer = serde_json::ser::Serializer::new(&mut buffer);
    let mut map_serializer = serializer.serialize_map(None)?;
    for (key, value) in iter {
        map_serializer.serialize_entry(key, value)?;
    }
    map_serializer.end()?;
    // SAFETY: serde_json always produces a valid UTF-8 string
    Ok(unsafe { String::from_utf8_unchecked(buffer) })
}

Playground link

My question is basically twofold:

  • Is it correct to use unchecked String constructor here? It seams to me, that it should be correct, because serde_json should produce a valid json -> valid UTF-8 string, but I'm still a bit sheepish.
  • Even if it's correct should I? I don't expect it to be a performance bottleneck, but if I see, that I can satisfy the invariants of more performant unsafe function, should I do that?

As always with performance, two main rules are: measure if in doubt (criterion might help here), err on the safe side by default (i.e. if the measurable difference is small enough).

6 Likes

I wonder why you're doing this in the first place. Why not collect the iterator into an appropriate map and let serde take care of its de-/serialization.
This way you also don't limit yourself to JSON, but you can use any De-/Serializer.

I didn't want to allocate the map here if I had the option :slight_smile:
The original code is very specific to json, so I opted to using json-specific API.

But I meant the original question as more general. All of us, let's say, re-use a value computed from a vector instead of recomputing it. Not because we measured performance impact, but because it's basic compitence to not do something in O(n) if you can do it in O(1) with no penalty. So the question is, should I think about using unsafe functions as "no penalty", if their invariants are meat by nature of surrounding local code. If I got @Cerber-Ursi's answer, the consensus is no, unless I can prove real-world benefit of such a change.

1 Like

Hmm, maybe a poll is needed to get a greater picture... :wink:

However, a different perspective other then performance maybe helps.

Variant A:

// SAFETY: above code always produces valid UTF-8
let foo = unsafe { String::from_utf8_unchecked(buffer) };

Variant B:

// Above code always produces valid UTF-8, so unwrap() is ok
let foo = String::from_utf8(buffer).unwrap();

Given that the magic "above code" isn't to long and somewhat simple and obvious, I'm not shy to choose variant A. But this is my personal choice.

Nobody else wanted to add something, so I took it that everyone agrees on this :grin:

My goal is either to use no unsafe code at all, or to isolate the unsafe code in small separate crates, or at least separate modules, that can be easily audited. In Cargo.toml I use:

[lints.rust]
unsafe_code = "deny"

and then in the unsafe modules I use:

#![allow(unsafe_code)]

This motivates me to say no to unsafe code in situations like this.

If I needed it for performance in this case, I would create a small utility module that does the json serialize and returns a String.

It's just one way to discipline yourself.

3 Likes

Why don't you just serde_json::to_string() it?

fn key_value_pairs_to_json<'a>(iter: impl Iterator<Item=(&'a str, &'a str)>) -> serde_json::Result<String> {
    use std::cell::RefCell;

    struct Wrapper<T>(RefCell<T>);

    impl<T: Iterator<Item = (&'str, &'str)>> serde::Serialize for Wrapper<T> {
        fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
            serialzier.collect_map(&mut *self.0.borrow_mut())
        }
    }

    serde_json::to_string(&Wrapper(RefCell::new(iter)))
}
4 Likes

That will require more explanation than the unsafe one, but I can't help but love it :grin:
PS: playground link to a working version of your thing for future generations

2 Likes