Can reference a part of a slice via Rc to achieve zero copy?

I'm a rust newbie. I'm trying to achieve zero-copy parsing of a file by saving the borrow of &'a [u8], but soon &‘a messes up the lifecycle (again, I'm a newbie :pleading_face: and I don't have much time to struggle with the life cycle), so I want to quickly solve this problem through Rc, but I am lost, I don't know how to convert & [u8] to Rc<[u8]>.

The conversion through Rc::from is effective, but it seems that makes a data copy:

pub data: &'a [u8]
...
fn read_part(&self, size: usize) -> Rc<[u8]> {
    let part = &self.data[..size]
    Rc::from(part)
}

Any help will be appreciated! Thank you in advance :wink:

Rc can't do that, but you could make a wrapper struct that holds the full Rc<[u8]> and a Range<usize>, then impl Deref to retrieve its &[u8] sub-slice on demand.

5 Likes

Good idea, but how do I convert full &[u8] to Rc<u8>? BTW, the &[u8] is converted from Mmap

The bytes - Rust crate maybe able to help. But it maybe a bit tricky for a newbie. All data needs an owner, Rc types allowed shared ownership (readonly). But that also means Rc<[u8]> owns the data so to make an Rc<[u8]> generally requires coping the data at least once into the Rc. A Bytes type is like a mix of an Arc<[u8]> and a slice, so you can make a Bytes and make a new own that holds a subslice of it but shares the same allocation.

You could have your wrapper hold Rc<Mmap> and a range instead.

Do you really need your "parts" to share ownership though? If you could put lifetime like 'mmap in their type signatures, then you can just borrow &'mmap [u8] slices directly.

2 Likes

Since you're using a Mmap and Rc (only one thread). Then cuviper's suggestion of just borrowing the base mmap is probably the best.

Yes, I'm doing that now, but it's a little tricky because the structures based on it also need the same lifecycle... Now my life cycle is a mess and I want to get out of it as soon as possible

But the wrapper derefs Mmap as a slice every time a range is borrowed. Doesn't that cause a lot of extra overhead?

If you're looking for a way to have zero copy references to slices of the data in an Mmap, without having to learn to manage lifetimes in all your data structures, and you don't plan on closing the file before exciting you could just box and leak the Mmap, which would give you a &'static Mmap reference, which would give you &'static [u8], which are easy to deal with.

This approach is limited, but may achieve what you're looking for.

2 Likes

I wouldn't think there would be any measurable overhead. You're just putting a pointer and length into a struct together, in effect, if I'm reading the Mmap deref implementation correctly.

2 Likes

Taking a subsliced string from a longer string by bumping up its reference count is zero-copy, but it has its own danger as well. This could create a situation when a tiny piece of string keeps a longer piece of string from being dropped.

If this keeps happening, then memory usage would keep going up, even though apparently everything is being done to prevent new memory from being allocated.

1 Like

Yes it's there, it just calls slice::from_raw_parts, but I'm not sure if there's any overhead in this function, and if there isn't any extra overhead this should be a good solution for me

This is supposed to be a memory leak? Perhaps I can be able to check its memory usage with some tools (like valgrind) to prevent this from happening?

This type of memory leakage is more common than you think in languages with GC. The solution is quite simple in programming policy. Basically you have to forbid all string subslices to be stored permanently. At the last moment when you want to store them for good, you have to send them to a string interner to obtain dedicated memory storage.

2 Likes

I came up with the following. Probably not idiomatic, and it will only support some limited operations yet (hence why maybe a crate is the better choice). But perhaps it's good to understand the possibilities in Rust a bit better.

use std::ops::Deref;
use std::ops::Range;
use std::rc::Rc;

struct RcStr {
    data: Rc<String>,
    range: Range<usize>,
}

impl Deref for RcStr {
    type Target = str;
    fn deref(&self) -> &Self::Target {
        // NOTE: `Range` is not `Copy`, hence we need the `.clone()`
        &self.data[self.range.clone()]
    }
}

impl RcStr {
    fn new(string: String) -> Self {
        let len = string.len();
        RcStr {
            data: Rc::new(string),
            range: 0..len,
        }
    }
    fn rc_index(&self, mut range: Range<usize>) -> Self {
        range.start += self.range.start;
        range.end += self.range.start;
        RcStr {
            data: self.data.clone(),
            range,
        }
    }
}

fn parse(rc_s: RcStr) {
    let hello: RcStr = rc_s.rc_index(0..5);
    let world: RcStr = rc_s.rc_index(6..rc_s.len()); // NOTE: `6..` won't work yet
    let trimmed: RcStr = world.rc_index(0..world.len()-1);
    // With the `println!` macro, coercion into `&str` is needed
    // as we didn't implement `std::fmt::Display` for `RcStr`
    println!("hello = {}", &hello as &str);
    println!("world = {}", &world as &str);
    println!("trimmed = {}", &trimmed as &str);
}

fn main() {
    let s: String = "Hello World!".to_string(); // `String` gets allocated
    let rc_s: RcStr = RcStr::new(s); // `String` is moved into an `Rc`, not cloned
    parse(rc_s);
}

(Playground)

Output:

hello = Hello
world = World!
trimmed = World

P.S.: The code is related to the original question only. Possibly data: String could be replaced with any other owned value holding the actual data (as long as it's possible to borrow a &str from it).

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.