Pass only part of Vec but still consume it

I have a function that verifies and decrypts some data. Currently it takes a Vec and returns another:

use block_modes::BlockMode;
use block_modes::Cbc;
use aes_soft::Aes256;
use block_modes::block_padding::NoPadding;

fn decrypt(mut encrypted: Vec<u8>) -> Vec<u8> {
    if encrypted.len() < 8 + 32 { panic!("encrypted data too short") } ;

    let (_header, rest) = encrypted.split_at_mut(8);
    let (_mac, rest) = rest.split_at_mut(32);
    let mut ciphertext = rest;

    let key = [0; 32]; // Example value
    let iv = [0; 16]; // Example value
    let cipher = Cbc::<Aes256, NoPadding>::new_var(&key, &iv).unwrap();
    cipher.decrypt(&mut ciphertext).unwrap();

    encrypted.drain(0..8 + 32);

    encrypted
}

It's called like this:

fn main() {
    let v: Vec<u8> = [0; 64+8+32].to_vec(); // Example value
    let _decrypted = decrypt(v);
}

Sometimes the input data has an additional header, here represented by 10 more bytes in the array:

fn main() {
    let v2 = [0; 10+64+8+32].to_vec();
    // ???
    let _decrypted = decrypt(v2);
}

My question is, can I somehow pass v2 to decrypt() only starting from the 11th byte?

We can change the signature of decrypt() but (I think?) it should still consume the Vec, because the Vec gets mutated inside decrypt() making it useless outside of it. Also, the Vec is already copied in memory once because of the drain(), so I'd like to avoid copying it a second time.

If you allocate a Vec, the allocator only tracks a single pointer to its head. There's no way to split it into two Vecs without reallocating, because then you'll end up freeing something that wasn't allocated. If reallocating is fine, you can just use split_at and to_vec.

We can change the signature of decrypt() but (I think?) it should still consume the Vec, because the Vec gets mutated inside decrypt() making it useless outside of it.

You could try to take in the whole Vec and just return the header portion back to the user.

to_vec() copies the data, doesn't it?

I don't need the header at all, I just need to ignore it somehow. Sure, I could change fn decrypt(mut encrypted: Vec<u8>) -> Vec<u8> to decrypt(mut encrypted: Vec<u8>, ignore_n_bytes: u32) -> Vec<u8> but cutting the header off isn't really the responsibility of decrypt().

Really there is not a manner to do this without reallocating the data, a Vec works saving its content in the head, because of that you only have two options in the low-level code before this, move the data or move the pointer(a usize that tells to the Vec the offset in memory of the data), if you move the pointer to after the header, then Vec will not manage the buffer part previous to the new pointer offset or even no manage the buffer at all and fail to deallocate the memory at end of use and if you move the data, well, that is basic the same operation that copy the whole data move, so in this case in any language you only can copy the data(although there is languages that hidden that of you).

EDIT: Oh!, there is a way, you only have to setup a guard(a custom structure written by you) so that when you does need more the data then restores the pointer position and dealloc it, although I think that operation will cost more that simply copying the data, you know that copy a buffer is hardware optimized?

If you're draining it anyway, why not just do that first?

fn decrypt(mut encrypted: Vec<u8>) -> Vec<u8> {
    if encrypted.len() < 8 + 32 { panic!("encrypted data too short") } ;

    encrypted.drain(0..8 + 32);

    let key = [0; 32]; // Example value
    let iv = [0; 16]; // Example value
    let cipher = Cbc::<Aes256, NoPadding>::new_var(&key, &iv).unwrap();
    cipher.decrypt(&mut encrypted).unwrap();

    encrypted
}

If you did need to use the header information, you could use split_off instead of drain to keep it around.

I'd say not allowing this is part of the tradeoff you give by giving consumers a Vec<u8> -> Vec<u8> interface rather than a &mut [u8] -> &[u8] or &mut [u8] one. It's slightly more efficient if you need to store the resulting vec, but really not as ergonomic or versatile.

Unless this is a real hot-path in the application, I'd argue for changing the signature to something like:

/// Note: the contents of `encrypted` are explicitly undefined after this function executes. 
fn decrypt(encrypted: &mut [u8]) -> &[u8] {

If you need an owned Vec<u8> after that, then just to_owned() the &[u8]. The only extra cost will be from allocating a new buffer (since with encrypted.drain(0..8 + 32), you're already needing to memcpy the entire buffer from the end to the beginning of the array). If the consumer doesn't need to store the resulting data, this new function would even be more efficient, since instead of draining you can just return the end of the array?

I commend the effort to document consuming and messing with the data by actually consuming the Vec<u8>, but I don't think it's worth keeping a worse interface for - a note in the documentation (and the fact that encrypted needs to be a mutable borrow) should give enough warning that the data after the call shouldn't be relied upon.

If you know that the code of people calling this function will need a Vec<u8> with the result, and this is part of a hot path in the program which needs the optimization, then adding in a new argument ignore_n_bytes could definitely be validated. If you're going this route, another option might be taking in a range: Range<usize> for the bytes to decrypt.

Shouldn't this be "unspecified" (i.e. "you don't get UB if using it, but you also don't get any pre-known byte pattern")?

1 Like

You can't.

The Vec type explicitly starts at index 0, and has no ability to do otherwise. Vec is not an abstract container, but specifically this exact implementation that is just {data, capacity, length}, and there's no room for a start offset.

data must be valid to pass to free(), so it has to point exactly to the start of the allocated memory area, and nowhere else.

That's why it's a best practice for methods to take slices instead of Vec. You can slice any contiguous container anywhere you want.

The result of v.into_iter() is a allec::vec::IntoIter. If I could change decrypt() to take in such an IntoIter I could pass it a partially consumed iterator, and then call .as_slice() on it?

However, it seems I can't use this IntoIter directly?

error[E0433]: failed to resolve: use of undeclared type or module `alloc`
 --> src\bin\foo.rs:5:5
  |
5 | use alloc::vec::IntoIter;
  |     ^^^^^ use of undeclared type or module `alloc`

Replace alloc with std.

This seems to work:

use block_modes::BlockMode;
use block_modes::Cbc;
use aes_soft::Aes256;
use block_modes::block_padding::NoPadding;
use std::vec::IntoIter;

fn main() {
    let v: Vec<u8> = [0; 10+64+8+32].to_vec(); // Example value
    let mut i = v.into_iter();
    i.nth(9);
    let _decrypted = decrypt(i);
}

fn decrypt(mut encrypted: IntoIter<u8>) -> Vec<u8> {
    let encrypted = encrypted.as_mut_slice();

    if encrypted.len() < 8 + 32 { panic!("encrypted data too short") } ;

    let (_header, rest) = encrypted.split_at_mut(8);
    let (_mac, rest) = rest.split_at_mut(32);
    let mut ciphertext = rest;

    let key = [0; 32]; // Example value
    let iv = [0; 16]; // Example value
    let cipher = Cbc::<Aes256, NoPadding>::new_var(&key, &iv).unwrap();
    cipher.decrypt(&mut ciphertext).unwrap();

    ciphertext.to_vec()
}

Any downsides, apart from the slightly weird interface of decrypt()?

It's still just one copy (ciphertext.to_vec()), right?

This doesn't make sense. If you're calling as_mut_slice on it, just take &mut [u8] instead. Slices are a strictly more general-purpose type.

You're currently using Vec-only IntoIter concrete implementation. That's not supposed to be used like that. Usually it's supposed to be impl std::iter::IntoIter (not std::_vec_::IntoIter) that is generic for all iterable containers. In your case it still only works with Vec and nothing else.

BTW, I see there's decrypt_vec which could be used instead of decrypt and to_vec. And in that case you could take &[u8] which is the best type for this.

For example, if I had data in CustomVec<u8> (e.g. arrayvec or cvec), I would not be able to pass Vec or vec::IntoIter without copying the data and making a Vec first. But I could make &[u8] from any Vec-like container.

2 Likes

But that doesn't consume the Vec, leaving it to the caller in its invalid state.

It seems to me you're going to great lengths to do the decryption in-place, without copying the data, and then you're copying it all anyway. Why bother?

Just copy the ciphertext into a new Vec<u8> first and then decrypt it. That frees you up to take &[u8] instead of &mut [u8], and there's no way you can invalidate the input slice because you can't mutate it at all. (Edit: I just realized this is what decrypt_vec does, so if you go this way you should probably use that)

vec::IntoIter is just an awkward way of passing a Vec and a starting index, except that you can't reuse the allocation, so if this is about exposing the most efficient possible API I would go for Vec<u8> + usize.

1 Like

Those are excellent points.