More efficient implementation of `String.truncated_lossy`

ActuallyHappening · July 8, 2025, 12:55pm

I want to call String.truncate but it may panic, and this is unacceptable as it introduces a DOS attack vector. So in my ad-hoc std lib ystd I implemented a lossy version of this function that I was already using like this:

	/// Like [String::truncate] but doesn't panic.
	/// 
	/// Somebody please optimize this implementation
	fn truncated_lossy(mut self, new_len: usize) -> String {
		// SAFETY: We then copy basically the whole string confirming its all UTF-8
		unsafe { self.as_mut_vec() }.truncate(new_len);
		String::from_utf8_lossy(self.as_bytes()).into_owned()
	}

Source here: YMap/ystd/src/string.rs at 6b8261119e918b2f63dade10b65889a28715912a · ActuallyHappening/YMap · GitHub

I'm sure some better rustaceons would love to spend a few ~~minutes~~hours thinking up the optimal solution, so I post it here and will copy+paste+cargo release a new version of ystd when that happens

Bruecki · July 8, 2025, 1:35pm

Why not just decrease new_len until str::is_char_boundary returns true?

Schard · July 8, 2025, 1:41pm

Why use unsafe code?

#![feature(string_from_utf8_lossy_owned)]

pub trait StringExt {
    fn truncated_lossy(self, new_len: usize) -> Self;
}

impl StringExt for String {
    fn truncated_lossy(self, new_len: usize) -> Self {
        let mut bytes = self.into_bytes();
        bytes.truncate(new_len);
        Self::from_utf8_lossy_owned(bytes)
    }
}

fn main() {
    let s: String = "Hello ¥↑↑ World!".into();
    let truncated_lossy = s.truncated_lossy(7);
    println!("{truncated_lossy}");
}

nerditation · July 8, 2025, 4:03pm

the idea of "a lossy string truncation" is inherently problematic for utf8.

by convention, the term "truncate" implies the output is shorter (or equal) to the input, but a "lossy" conversion for unicode string involves substituting invalid code units with U+FFFD, which, in utf8 encoding, could actually increase the string length.

I don't know what's your intension for this API is, but with your implementation, you may get very supprising result (playground):

let s = String::from("£");
assert!(s.len() == 2);
let s = truncated_lossy(s, 1);
assert!(s.len() == 3);

I think practically the more useful one is truncation to a length that has been rounded down to nearest code point boundary, which I would suggest a name like truncate_floor(), truncate_upper_bound(), or something the like.

for this, the implementation is trivial:

/// output is not longer than `new_len`
fn truncate_floor(s: String, mut new_len: usize) -> String {
    if new_len >= s.len() {
        return s;
    }
    let mut bytes = s.into_bytes();
    loop {
        let b = bytes[new_len];
        if b < 128 || b >= 192 {
            break;
        }
        new_len -= 1;
    }
    bytes.truncate(new_len);
    String::from_utf8(bytes).unwrap()
}

EDIT:

it's even simpler using str::is_char_boundary() as suggested by @Bruecki

fn truncate_floor(mut s: String, mut new_len: usize) -> String {
    new_len = usize::min(new_len, s.len());
    while !s.is_char_boundary(new_len) {
        new_len -= 1;
    }
    s.truncate(new_len);
    s
}

END of EDIT

ActuallyHappening · July 9, 2025, 1:16am

fn truncate_floor(mut s: String, mut new_len: usize) -> String {
    new_len = usize::min(new_len, s.len());
    while !s.is_char_boundary(new_len) {
        new_len -= 1;
    }
    s.truncate(new_len);
    s
}

Yes, this seems the best implementation, because it mirrors String::truncate in its dealing with the panic condition. I'm very glad I asked this question, so many interesting solutions, but this is the one I'll stick with, with a &mut self and self version

system · October 7, 2025, 1:16am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Truncate str to N UTF-8 chars help	19	1308	March 29, 2023
Getting a `String` from a slice without reallocation	7	885	August 9, 2023
Trim String in place? help	19	27151	March 6, 2023
Why does this unsafe code fail? help	8	639	March 17, 2022
String unicode awareness help	14	1394	March 19, 2022

More efficient implementation of `String.truncated_lossy`

Related topics