Weird behaviour of `Vec::from_raw_parts`

I've created a minimal snippet to reproduce my confusion:

fn main() {
    let bytes: Vec<u8> = vec![0, 0, 0, 0];
    let bytes = bytes.leak();
    let (bytes, _) = bytes.split_at(2);  // this works.
    // let (_, bytes) = bytes.split_at(2);  // this does not, throws 'free(): invalid pointer' at the end.
    let results = unsafe {
        Vec::from_raw_parts(bytes.as_ptr() as *mut u16, 1, 1)
    };
    println!("results: {:?}", results);  // this can always be printed.
}

I've briefly viewed the internal implementations and have no ideas, everything seems symmetric, yet different results. :cry:

I'm pretty newbie and this is one of my few attempts to be unsafe, no luck though.

Any help will be appreciated, thanks in advance!

Here's a link to the playground if needed!

The safety requirement of Vec::from_raw_parts() which you are violating is:

  • ptr must have been allocated using the global allocator, such as via the alloc::alloc function.

The pointer you get from the second output of slice::split_at() is not a pointer that was produced by the global allocator.

More broadly, you cannot split a heap allocation into two parts such that you can free() only one of the parts; allocators do not support that. You might be interested in Vec::split_off(), though — that will give you a Vec for the second portion, but it has to actually allocate that.

12 Likes

Thanks for your detailed explanations!!

In fact I've suspected about the global allocator constraint, but I ended up digging into the split_at function. Now my understanding is that the ops latter on are just pointer ops and will not change the allocator properties, which are only defined at the very beginning (e.g., when I created the vec). Correct me if I'm wrong!

For the split_off, I've already looked at it before, but I once thought it's the same as split_at. Can I now simply understand it as a split_at with allocations (so it's also zero-copy)? Thanks!

A pointer doesn't have “allocator properties”. Rather, allocators are designed to work with pointers that have the exact addresses they know about, and are not designed to work with other pointers.

Program: I need 64 bytes of memory.
Allocator: Okay, here's some memory you can use at address 0x100000.
Program: I don't need 0x100000 any more.
Allocator: Okay, I'll update my records.
Program: I need 64 bytes of memory.
Allocator: Okay, here's some memory you can use at 0x100100.
Program: I don't need 0x100120 any more.
Allocator: What? I don't know that address. I don't know what to do with it.

I'm not sure what you mean, but split_off must copy the elements in the second half of the split into a new allocation. That's the only way for the second Vec to have an allocation that it owns.

9 Likes

Please notice that even the code that works has undefined behaviour! You are violating the second requirement:

T needs to have the same alignment as what ptr was allocated with.

u16 has an alignment of 2. You can test that yourself by running it with MIRI. You can even do that on the playground in the TOOLS menu at the top right.

Playing around is a valuable way to learn stuff but be careful that unsafe code is likely to bite you here. It might work when you try and play around with it but fail spuriously later on. MIRI is a very useful tool for checking that you got all invariants that you have to uphold correct.

9 Likes

And not only the alignment constraint, OP is creating a Vec with a capacity of 2 bytes while it originally had capacity for 4 bytes (or possibly more).

SUPER intuitive, thanks!!! :smile::pray:

The concern of zero-copy actually originates from my wish to implement customized sterilization: I can easily dump bytes and concatenate them into one 'compact' bytes vec, but I'm having trouble when trying to restore them. I know copy around can work but I definitely want a zero-copy way to do it, any advice?

For example, I'm wondering if there exists a magical function that does the following:

  • tell the allocator to 'forget' about the spaces allocated to an address (ptr) after certain length (len).
  • tell the allocator to allocate the memory I want at exactly ptr.add(len) address.

If this function exists, and suppose the alignments are good, maybe it can achieve zero-copy?

Wow, that's a good alert!

I've thought of alignment before and what I've done is simply run it over and over again to see if it works. I even printed out the address of the original bytes and turns out it was indeed even every time. :rofl:

MIRI is something new to me, but it seems to be the promising way when dealing with unsafe, I'll dig into it!

By definition, a Vec<u16> has a unique heap allocation, so you can't “zero copy deserialize” a structure containing a Vec<u16> and other things, unless you carefully orchestrate the sizes of your reads so that you read the data for the Vec directly into that Vec allocation, and read none of the other data that goes elsewhere.

The easier general option is to use a Cow<'a, [u16]> instead, because it has the option of borrowing its input, but you'll need to design the serialization format so the data is correctly aligned when read.

In general, “zero copy” constrains your data structures, your serialized data format, and the code that uses the data structures. It's not necessarily something you want to actually do for all applications.

2 Likes

These functions don't exist — if they did, they'd be the first thing I told you about instead of just saying that you were passing the wrong pointer to from_raw_parts.

If you want an allocator that supports this, it will be a very special allocator, and you will probably have to write it. Unfortunately, the “substitute an allocator, per data structure” feature of the standard library, allocator_api, isn't stable yet; you can use nightly Rust or (I've heard but not tried myself) allocator-api2. Or write an entire global allocator that can handle all the program’s heap needs too.

1 Like

Thanks again for the quick and detailed reply!

I'm in fact working with large numpy arrays, so zero copy does concern me. My scenario is also pretty complicated, that I may serialize different dtypes (which means different alignments) into a single compact bytes vec.

Now it seems that this problem is far more complicated than I imagine, so the last things I'm wondering are:

  • Is copying actually acceptable when I try to deserialize large numpy arrays?
  • Is there any resources that teach to load a compact bytes that contain heterogeneous numpy arrays, without copying? (I think asking for details here is a bit too much so I'm willing to learn some stuffs on my own first. :smile:)

Then if you want “zero-copy”, you have to either cleverly read the input data into appropriately-aligned separate allocations created before you read, or you have to design your data format so it contains padding bytes to create alignment (and read the file into a maximally-aligned buffer too, so that the alignment isn't spoiled by the choice of allocation it was read in to).

  • Is copying actually acceptable when I try to deserialize large numpy arrays?

That's entirely up to your goals and the particular hardware you're trying to achieve them on (and the size of the file). In general, there's often room for making programs faster by pulling unhinged shenanigans, and it's unwise to conclude that therefore your program must contain maximum shenanigans, or that the particular shenanigan you’ve currently thought of is the one to pull first.

And you must always benchmark and profile. Benchmark to determine how well your program is doing and the effect of changing it; profile to determine which parts of the program are worth spending effort on. These will give you surprising and counterintuitive information; you cannot dispense with measurement and just write the fastest program entirely from theory.

4 Likes

So many valuable advices again! :smile: :pray:

I can understand this, it looks like the most applicable solution when balancing performance and maintenance, and I think it will be my final choice. But if only for learning purpose:

I can in fact try to design it and inject padding bytes. My question is: even after I make sure everything is fine, is there exists methods other than the first approach (i.e., create separate allocations before reading) to read the carefully designed bytes vec, in zero copy way?

Benchmark is already considered and I already have some tools for benchmarking, so I asked just to see if there are general disciplines, thanks for pointing out anyway!

The other approach is the basic one which I mentioned here: write data structures that borrow the input.

1 Like

Got it, I'll look into Cow then! :heart: