I have a program that tokenizes an input. In the first step, the input is split into many small pieces. In the second step, some of these pieces have to be reassembled again, if they're adjacent. So I need a function like this:
fn try_join<'a>(s1: &'a str, s2: &'a str) -> Option<&'a str> {
let s1 = s1.as_bytes().as_ptr_range();
let s2 = s2.as_bytes().as_ptr_range();
if s1.end == s2.start {
let len = s2.end as usize - s1.start as usize;
Some(unsafe {
mem::transmute(std::slice::from_raw_parts(s1.start, len))
})
} else {
None
}
}
Is this correct? If not, is there a better alternative (that ideally doesn't require unstable features)?
The logic looks correct. You can also avoid using transmute by using a couple of other functions.
However, according to the unsafe coding gudielines discussions, as far as I know them, this can't be a safe function, because joining &strs together like this:
Is safeif they were originally joined / come from the same string
Is UBif it's from two string parts that only happen to be next to each other in memory (this could happen with two adjacent stack allocated variables).
It might be possible to use a wrapper type with branding/generativity to ensure the string parts come from the same original string without any runtime cost, but it's convoluted.
Thanks, that's that I assumed. The strings are guaranteed to come from the same string in my use case, so it is sound, but I'll make it an unsafe function just in case.
However, I'm not sure if str is guaranteed to have the same representation as [u8].
Thanks, that's a good idea. However, I probably won't use it; I'm currently using my own string slice type, which looks like this:
pub struct StrSlice {
start: usize,
end: usize,
}
It's essentially a Range<usize>, but with some additional methods. Its advantage is that it doesn't borrow the string, so no ownership and lifetime problems. Its disadvantage is that it doesn't borrow the string, so I have to pass the original string to any method that accesses the string slice. I wanted to see if I can get rid of this "hack" and use a normal &str everywhere, but that turned out to be really cumbersome. I got hundreds of errors because Rust methods can't partially borrow, so I'd have to completely rewrite my parser to make it work.