Ygg01
May 28, 2025, 11:45pm
1
I have a following struct:
pub struct Deserializer<'de> {
tape: Vec<&'de str>,
}
It's meant to store zero copy strings. In theory at least.
To parse I have the following function and traits:
impl<'de> Deserializer<'de> {
fn parse_to_vec<'s, S, B>(input: S, mut buffer: B) -> Self
where
S: Source<'s>,
B: Buffer<'de>
{
Deserializer {
tape: vec![buffer.append(input.to_source())],
}
}
}
pub trait Source<'s> {
fn to_source(&self) -> &'s str;
}
pub trait Buffer<'b> {
fn append<'src>(&mut self, src: &'src str) -> &'b str;
}
I have Source and Buffer as traits because I want to abstract over sources and Buffers. For example if I have a borrowed string, I don't need a buffer, so I can easily say:
impl<'b> Buffer<'b> for () {
fn append<'src>(&mut self, src: &'src str) -> &'b str {
// THING I'm worried about
unsafe {transmute(src)}
}
}
impl<'s> Source<'s> for &'s str {
fn to_source(&self) -> &'s str {
self
}
}
pub fn main() {
let Deserializer { tape } = Deserializer::parse_to_vec("Hello, world!", ());
println!("{}",tape[0]); // prints: Hello, world!
}
However, is it OK to implement like this? is the transmute
gimmick sound, if I implement it like this?
No.
impl<'b> Buffer<'b> for () {
fn append<'src>(&mut self, src: &'src str) -> &'b str {
// THING I'm worried about
unsafe {transmute(src)}
}
}
This allows one to change any lifetime to any other lifetime. What were you attempting to do?
1 Like
Ygg01
May 28, 2025, 11:59pm
3
Not sure what you mean by that. The traits are for crate only.
The idea is simple. Have a zero-copy parser, but allow for buffered readers, without duplicating my code.
If you're given a string slice Deserializer::parse_to_vec("Hello, world!", ())
the tape lives as long as the input string "Hello, world!"
.
If you're given a Deserializer::parse_to_vec(reader, &mut Vec<u8>)
the tape lives as long as the Vec<u8>
Yes, I could make the lifetime always be bound to &mut Vec<u8>
but then the thing isn't a zero-copy parser, which is my goal.
It's a use-after-free, a segfault, undefined behavior. Here's the same thing closer to your original main
. (If you don't care about that... I don't understand what you're asking.)
The only way a method body matching this signature can return src
in some form and be sound:
pub trait Buffer<'b> {
fn append<'src>(&mut self, src: &'src str) -> &'b str;
}
Is if 'b
cannot be extended:
pub trait Buffer<'b> {
// vvvvvvvv
fn append<'src: 'b>(&mut self, src: &'src str) -> &'b str;
}
After which you don't need transmute
, but do need to connect the lifetimes in parse_to_vec
.
impl<'b> Buffer<'b> for () {
fn append<'src: 'b>(&mut self, src: &'src str) -> &'b str {
- unsafe {transmute(src)}
+ src
}
}
impl<'de> Deserializer<'de> {
fn parse_to_vec<'s, S, B>(input: S, mut buffer: B) -> Self
where
+ 's: 'de,
S: Source<'s>,
B: Buffer<'de>
{
(I haven't thought through your goals so far.)
2 Likes
Perhaps:
pub trait Buffer<'b> {
fn append<'src: 'b>(self, src: &'src str) -> &'b str;
}
impl<'b> Buffer<'b> for () {
fn append<'src: 'b>(self, src: &'src str) -> &'b str {
src
}
}
impl<'b> Buffer<'b> for &'b mut String {
fn append<'src: 'b>(self, src: &'src str) -> &'b str {
let offset = self.len();
self.push_str(src);
&self[offset..]
}
}
1 Like
kornel
May 30, 2025, 11:34am
6
You can't give out references to parts of a String
and then use push_str
on it! It will crash and corrupt data, since String
may reallocate and invalidate all previous str
s borrowed from it.
It needs more care, see:
BTW, you may need to replace &mut self
with &self
, and use RefCell
and such to mutate. &mut
references aren't just mutable, they are exclusive, and when a lifetime bound applies to both, both will have the same very restrictive exclusivity requirements.
It may be easier to have two different methods, one for borrowing from the input, one for using the Vec as an arena. Borrowing from the input is an easy case, and can be done entirely in safe Rust. Arenas are tricky and unsafe.
1 Like
Ygg01
May 31, 2025, 12:27pm
7
Yeah. It seems my only solution is to return offsets, and then create the nodes in the final step.