I have a "shower thought" question...
There are many slice and string functions that return slices and strings. Some, like str::as_bytes
and OsStr::new
, are cheap conversions from one type to another. Others, like trim
and split_at
, return sub-slices and sub-strings of their input.
Is there any general guarantee that these functions will create their output from their input? Will they always return sub-slices and sub-strings of their input arguments? Or do they have the freedom to return equivalent objects not derived from their input—for example, returning references to static or interned values?
trim
pub fn trim(&self) -> &str
Returns a string slice with leading and trailing whitespace removed.
Can trimming a string that has no whitespace return a different string? Can this assertion fail?
let s = "foo";
assert_eq!(s.as_ptr(), s.trim().as_ptr());
I'm not saying it'd be a good idea, but would it be permissible for trim
to return a reference to a different interned "foo"
if it recognized that it had already trimmed that input before?
split_at
pub fn split_at(&self, mid: usize) -> (&[T], &[T])
Divides one slice into two at an index.
The wording strongly implies that the output slices are sub-slices of the input. Does splitting a slice always return sub-slices? Is it possible for the first assertion below to fail? Is the second assertion safe?
let s1 = &b"foo"[..];
let s2 = s1.split_at(0).0;
assert_eq!(s1.as_ptr(), s2.as_ptr());
assert_eq!(s1, unsafe { std::slice::from_raw_parts(s2.as_ptr(), s1.len()) });
Hypothetically, would it be legal for split_at
have an optimization like this at the top?
pub fn split_at(&self, mid: usize) -> (&[T], &[T]) {
if mid == 0 {
return (&[], self);
}
...
}
from_utf8_lossy
pub fn from_utf8_lossy(v: &[u8]) -> Cow<'_, str>
Converts a slice of bytes to a string, including invalid characters.
When converting bytes to a string, if the bytes are valid UTF-8, will the str
be created from the &[u8]
?
let s = "foo";
if let Cow::Borrowed(b) = String::from_utf8_lossy(s.as_bytes()) {
assert_eq!(s.as_ptr(), b.as_ptr());
}
It seems like it. This test passes. But hang on—
If you change let s = "foo";
to let s = "";
it fails. Why? Because from_utf8_lossy
has a code path where it returns Cow::Borrowed("")
. You'll notice the ""
reference isn't derived from the input slice v
.
Is that a bug? Or is it acceptable behavior?
If it is acceptable, what does it say about the other cases above?