What is the undefined behaviour?

I try to parse each field and return their mutable slice of the value.
The key-value pair is a pair of C-style string and an big-endian u32.

I know that borrow checker does not allow having multiple mutable borrow at the same time even they does not overlap with each other. So I try to use std::mem::transmute to stop it from compaining. Like the following:

use std::{collections::HashMap, io::Write};

fn parse(data: &mut [u8]) -> Result<HashMap<&[u8], &mut [u8; 4]>, &'static str> {
    let mut map = HashMap::new();
    let mut cursor = 0;
    while cursor < data.len() {
        let mut key_len = 0;
        while data[cursor + key_len] != 0 && cursor + key_len < data.len() {
            key_len += 1;
        }
        if cursor + key_len + 5 > data.len() {
            return Err("Not enough data for value");
        }
        unsafe {
            let key = std::mem::transmute(&data[cursor..cursor + key_len]);
            let value = std::mem::transmute(
                TryInto::<&mut [u8; 4]>::try_into(
                    &mut data[cursor + key_len + 1..cursor + key_len + 5],
                )
                .unwrap(),
            );
            map.insert(key, value);
        }
        cursor += key_len + 5;
    }
    Ok(map)
}

fn main() {
    let mut data = vec![];
    data.write(b"key1").unwrap();
    data.write(&[0]).unwrap();
    data.write(&24_u32.to_be_bytes()).unwrap();
    data.write(b"key2").unwrap();
    data.write(&[0]).unwrap();
    data.write(&42_u32.to_be_bytes()).unwrap();
    let parsed = parse(&mut data).unwrap();
    println!("Value1: {}", u32::from_be_bytes(*parsed["key1".as_bytes()]));
    println!("Value2: {}", u32::from_be_bytes(*parsed["key2".as_bytes()]));
}

But cargo +nightly miri run compains. How to fix that?

   Compiling play v0.1.0 (/home/billy/Projects/play)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.05s
     Running `/home/billy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/play`
error: Undefined Behavior: trying to retag from <3015> for SharedReadOnly permission at alloc1100[0x0], but that tag does not exist in the borrow stack for this location
  --> src/main.rs:22:24
   |
22 |             map.insert(key, value);
   |                        ^^^
   |                        |
   |                        trying to retag from <3015> for SharedReadOnly permission at alloc1100[0x0], but that tag does not exist in the borrow stack for this location
   |                        this error occurs as part of retag at alloc1100[0x0..0x4]
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3015> was created by a SharedReadOnly retag at offsets [0x0..0x4]
  --> src/main.rs:15:23
   |
15 |             let key = std::mem::transmute(&data[cursor..cursor + key_len]);
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <3015> was later invalidated at offsets [0x0..0x12] by a Unique retag
  --> src/main.rs:18:26
   |
18 |                     &mut data[cursor + key_len + 1..cursor + key_len + 5],
   |                          ^^^^
   = note: BACKTRACE (of the first span):
   = note: inside `parse` at src/main.rs:22:24: 22:27
note: inside `main`
  --> src/main.rs:37:18
   |
37 |     let parsed = parse(&mut data).unwrap();
   |                  ^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

you can use slice::split_at_mut().

3 Likes

Sadly, that's not how it works. unsafe doesn't mean you are allowed to do whatever you want, it specifically means "I understand the compiler cannot check the usual guarantees about memory safety, and I very carefully checked them myself and I promise I played by the rules".

Don't worry, we've all been beginners at some point and probably made the same mistake at least once.

In this forum, we have a famous quote about this topic:

It has been 0 days since someone tried and failed to use unsafe to circumvent the lifetime system.

7 Likes

For example:

fn take_key<'a, 'b>(inout: &'b mut &'a mut [u8]) -> Result<&'a [u8], &'static str> {
    match inout.iter().position(|&x| x == 0) {
        Some(i) => {
            if let (key, [0, rest @ ..]) = mem::take(inout).split_at_mut(i) {
                *inout = rest;
                Ok(&*key)
            } else {
                unreachable!()
            }
        }
        None => Err("Unexpected EOF while reading key"),
    }
}

fn take_val<'a, 'b>(inout: &'b mut &'a mut [u8]) -> Result<&'a mut [u8; 4], &'static str> {
    let (val, rest) = mem::take(inout).split_at_mut(4);
    *inout = rest;
    Ok(val
        .try_into()
        .map_err(|_| "Unexpected EOF while reading value")?)
}

fn parse(mut data: &mut [u8]) -> Result<HashMap<&[u8], &mut [u8; 4]>, &'static str> {
    let mut map = HashMap::new();
    while data.len() > 0 {
        let key = take_key(&mut data)?;
        let val = take_val(&mut data)?;
        map.insert(key, val);
    }
    Ok(map)
}
2 Likes

fyi, there’s also a Default implementation allowing mem::take(inout)

1 Like

Thanks. I've updated my code.

It works. Thanks.

Before that, I tried just use raw pointer as in C. Like:

use std::{collections::HashMap, hash::Hash, io::Write};

struct Key(* const [u8]);
impl Hash for Key {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        unsafe {
            (*self.0).hash(state);
        }
    }
}
impl PartialEq for Key {
    fn eq(&self, other: &Self) -> bool {
        unsafe {
            *self.0 == *other.0
        }
    }    
}
impl Eq for Key {}

struct Val(* mut [u8; 4]);


fn parse(data: &mut [u8]) -> Result<HashMap<Key, Val>, &'static str> {
    let mut map = HashMap::<Key, Val>::new();
    let mut cursor = 0;
    while !data.is_empty() {
        let mut key_len = 0;
        while data[cursor + key_len] != 0 && cursor + key_len < data.len() {
            key_len += 1;
        }
        if cursor + key_len + 5 > data.len() {
            return Err("Not enough data for value");
        }
        let key = Key(&data[cursor..cursor + key_len]);
        let val = Val(&mut (&mut data[cursor + key_len + 1..cursor + key_len + 5]).try_into().unwrap());
        map.insert(key, val);
        cursor += key_len + 5;
    }
    Ok(map)
}

fn main() {
    let mut data = vec![];
    data.write(b"key1").unwrap();
    data.write(&[0]).unwrap();
    data.write(&24_u32.to_be_bytes()).unwrap();
    data.write(b"key2").unwrap();
    data.write(&[0]).unwrap();
    data.write(&42_u32.to_be_bytes()).unwrap();
    let _ = parse(&mut data).unwrap();
    // println!("Value1: {}", u32::from_be_bytes(*parsed["key1".as_bytes()]));
    // println!("Value2: {}", u32::from_be_bytes(*parsed["key2".as_bytes()]));
}

And miri compains when I dereference the raw pointer.

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
     Running `/home/billy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/play`
warning: field `0` is never read
  --> src/main.rs:20:12
   |
20 | struct Val(* mut [u8; 4]);
   |        --- ^^^^^^^^^^^^^
   |        |
   |        field in this struct
   |
   = note: `#[warn(dead_code)]` on by default
help: consider changing the field to be of unit type to suppress this warning while preserving the field numbering, or remove the field
   |
20 | struct Val(());
   |            ~~

error: Undefined Behavior: trying to retag from <3017> for SharedReadOnly permission at alloc1089[0x0], but that tag does not exist in the borrow stack for this location
  --> src/main.rs:7:13
   |
7  |             (*self.0).hash(state);
   |             ^^^^^^^^^
   |             |
   |             trying to retag from <3017> for SharedReadOnly permission at alloc1089[0x0], but that tag does not exist in the borrow stack for this location
   |             this error occurs as part of retag at alloc1089[0x0..0x4]
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3017> was created by a SharedReadOnly retag at offsets [0x0..0x4]
  --> src/main.rs:34:23
   |
34 |         let key = Key(&data[cursor..cursor + key_len]);
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <3017> was later invalidated at offsets [0x0..0x12] by a Unique retag
  --> src/main.rs:35:34
   |
35 |         let val = Val(&mut (&mut data[cursor + key_len + 1..cursor + key_len + 5]).try_into().unwrap());
   |                                  ^^^^
   = note: BACKTRACE (of the first span):
   = note: inside `<Key as std::hash::Hash>::hash::<std::hash::DefaultHasher>` at src/main.rs:7:13: 7:22
   = note: inside `core::hash::impls::<impl std::hash::Hash for &Key>::hash::<std::hash::DefaultHasher>` at /home/billy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/hash/mod.rs:941:13: 941:33
   = note: inside `<std::hash::RandomState as std::hash::BuildHasher>::hash_one::<&Key>` at /home/billy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/hash/mod.rs:701:9: 701:28
   = note: inside `hashbrown::map::make_hash::<Key, std::hash::RandomState>` at /home/billy/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-2eab394af869c8a2/hashbrown-0.14.3/src/map.rs:262:5: 262:31
   = note: inside `hashbrown::map::HashMap::<Key, Val, std::hash::RandomState>::insert` at /home/billy/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-2eab394af869c8a2/hashbrown-0.14.3/src/map.rs:1752:20: 1752:61
   = note: inside `std::collections::HashMap::<Key, Val>::insert` at /home/billy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/collections/hash/map.rs:1105:9: 1105:31
note: inside `parse`
  --> src/main.rs:36:9
   |
36 |         map.insert(key, val);
   |         ^^^^^^^^^^^^^^^^^^^^
note: inside `main`
  --> src/main.rs:50:13
   |
50 |     let _ = parse(&mut data).unwrap();
   |             ^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 1 warning emitted

I wonder what is the root cause of this error. Isn't it true that raw pointers has no no-aliasing optimization?

Also looking at

fn take_val<'a, 'b>(inout: &'b mut &'a mut [u8]) -> Result<&'a mut [u8; 4], &'static str> {
    let (val, rest) = mem::take(inout).split_at_mut(4);
    *inout = rest;
    Ok(val
        .try_into()
        .map_err(|_| "Unexpected EOF while reading value")?)
}

the error seems wrong. split_at_mut will panic on unexpected EOF, and your map_err case is essentially unreachable, because the slice always has the right length.

Incidentally, stable split_first_chunk_mut is already in beta, so that’s much nicer, anyways:

fn take_val<'a, 'b>(inout: &'b mut &'a mut [u8]) -> Result<&'a mut [u8; 4], &'static str> {
    let (val, rest) = mem::take(inout)
        .split_first_chunk_mut()
        .ok_or("Unexpected EOF while reading value")?;
    *inout = rest;
    Ok(val)
}

Unrelatedly, one could also write this using destructuring assignment (and one lifetime can be elided)

fn take_val<'a>(inout: &mut &'a mut [u8]) -> Result<&'a mut [u8; 4], &'static str> {
    let val;
    (val, *inout) = mem::take(inout)
        .split_first_chunk_mut()
        .ok_or("Unexpected EOF while reading value")?;
    Ok(val)
}
2 Likes

It… well I don’t know about the current state of optimization but that’s irrelevant for semantics. Importantly, raw pointers in Rust have aliasing restrictions just like references. The exact models (named “stacked borrows” and a newer somewhat less restrictive one “tree borrows”) of what semantics are at play aren’t quite final yet, and miri currently conservatively enforces “stacked borrows” (you also see it mentioned in the error message).

Well it’s not quite “like references” of course, multiple mutable pointers can point to the same location (or overlapping sections thereof), for example. However, it’s perhaps best & easiest to think of copies or offsets of a pointer as “the same pointer”, and then it’s more similar to references once again.

A different useful mental model is that raw pointers in Rust, when created by dereferencing references, inherit the access permissions of the original reference. Or rather of a re-borrow of that reference. And when that re-borrow is over (for example, when it was a mutable re-borrow of a mutable reference and you use the original reference again) then the pointer (together with all references and pointers derived from it as well) is also invalidated. And pointers from immutable references can only have immutable access.


I haven’t yet looked at your raw pointers code to check which (if any) of these principles is violated.

Edit: There it is. The relevant code section is already pointed out my miri

help: <3044> was created by a SharedReadOnly retag at offsets [0x0..0x4]
  --> src/main.rs:34:23
   |
34 |         let key = Key(&data[cursor..cursor + key_len]);
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <3044> was later invalidated at offsets [0x0..0x12] by a Unique retag
  --> src/main.rs:35:34
   |
35 |         let val = Val(&mut (&mut data[cursor + key_len + 1..cursor + key_len + 5]).try_into().unwrap());
   |                                  ^^^^

Something like &mut data[cursor + key_len + 1..cursor + key_len + 5] counts as accessing all of data because it goes through the desugaring into

&mut *IndexMut::index_mut(&mut *data, cursor + key_len + 1..cursor + key_len + 5)

and that involves access to all of data mutably to create the mutable re-borrow passed to the index_mut method.

As it accesses all of data mutably, this invalidates the pointer in the previously created Key, by the principles I’ve outlined above.

Unfortunately, the raw-pointer alternatives that wouldn’t do that are still unstable. So you’d need to do manual offset calculations, and decompose and re-compose slice pointers from raw (data & len) parts, in order to get the code to pass miri.

2 Likes

This code works without any unsafe or pointers.

fn parse(data: &mut [u8]) -> Result<HashMap<&[u8], &mut [u8; 4]>, &'static str> {
    let mut rest = data;
    let mut map = HashMap::new();
    let mut cursor = 0;
    while cursor < rest.len() {
        let mut key_len = 0;
        while rest[cursor + key_len] != 0 && cursor + key_len < rest.len() {
            key_len += 1;
        }
        let (chunk, new_rest) = rest.split_at_mut(cursor + key_len + 5);
        dbg!(&chunk, &new_rest);
        rest = new_rest;
        let (key, value) = chunk.split_at_mut(cursor + key_len);
        let value = &mut value[1..];
        dbg!(&key, &value);
        let value = TryInto::<&mut [u8; 4]>::try_into(value).unwrap();
        map.insert(&*key, value);
        cursor = 0;
    }
    Ok(map)
}
2 Likes