Debug crashes, release works

The code has been working fine for a few years, but after a certain rust update (probably within 1-2 months), the debug compilation will crash, but the release will still be fine.

Is this a rust version bug?

Of course not. Your code is wildly unsound. It's a lot more likely that you got the unsafe wrong than the compiler having a bug.

The panic message tells everything:

thread 'main' panicked at 'misaligned pointer dereference: address must be a multiple of 0x4 but is 0x556c865bbb3a', src/main.rs:21:30

You can't just cast pointers willy-nilly. If you want to serialize a data structure to an efficient binary format, use one of the existing crates (eg. bincode, cbor, or bson).

While you are not sufficiently experienced with the language, you basically shouldn't use unsafe for anything other than FFI, because you will get it wrong.

8 Likes

You have unsafe, so your first guess if you see a crash should be that your unsafe code has a bug (that was already present and merely revealed now), not that the compiler does.

Also, it is good practice to minimize the unsafe usage down to the minimum regions necessary, and use deny(unsafe_op_in_unsafe_fn) so that even unsafe fns make it clear which parts are performing unsafe operations. (Notably, map2bin was marked unsafe fn but contains no unsafe operations.)

Here is a first pass at making those adjustments:

#![deny(unsafe_op_in_unsafe_fn)]

use std::collections::HashMap;

fn main() {
    let mut hm2 = HashMap::new();
    hm2.insert(1.to_string(), "A".to_string());
    hm2.insert(2.to_string(), "B".to_string());
    let buf = map2bin(&hm2);
    println!("{:?}", unsafe { bin2map(&buf) });
}

unsafe fn bin2map(bin: &Vec<u8>) -> HashMap<String, String> {
    let mut th = HashMap::default();
    let mut pr = bin.as_ptr();
    let end = unsafe { pr.add(bin.len()) };

    while pr < end {
        unsafe {
            let i = u32::from_le(*(pr as *const u32)) as usize;
            let n = u32::from_le(*(pr.add(4) as *const u32)) as usize;
            let j = std::slice::from_raw_parts(pr.add(8), i);
            let k = std::slice::from_raw_parts(pr.add(8 + i), n);
            let sj = String::from_utf8_unchecked(Vec::from(j));
            let sk = String::from_utf8_unchecked(Vec::from(k));
            th.insert(sj, sk);
            pr = pr.add(8 + i + n);
        }
    }

    th
}

fn map2bin(m: &HashMap<String, String>) -> Vec<u8> {
    let mut sv = Vec::with_capacity(m.len() * 120);

    for (i, n) in m {
        let j = i.as_bytes();
        let k = n.as_bytes();

        let j_len = j.len();
        let k_len = k.len();

        sv.reserve(8 + j_len + k_len);

        sv.extend_from_slice(&(j_len as u32).to_le_bytes());
        sv.extend_from_slice(&(k_len as u32).to_le_bytes());
        sv.extend_from_slice(j);
        sv.extend_from_slice(k);
    }

    sv
}

That doesn't fix anything about the code, but it is now a little bit easier to comprehend, which is always something to work on when you're asking for help. You know what you're doing but other people are starting from no information except what you provide.

4 Likes

The alignment checks are new in 1.70, only with debug assertions enabled, but it has always been undefined behavior. The fact that it seemed to work before was "luck", of a sort.

5 Likes

While this code is somewhat irritating, I want to challenge the "wildly" part of your statement. There is only one piece of UB here.
If you fix let i = u32::from_le(pr.cast::<u32>().read_unaligned()) as usize; for both "sizes", miri does not complain any more.
fixed playground

Is it still unsound (i.e. can cause UB from safe code)? depends on the contract for bin2map. Certainly, if the casts as u32 in the "serialization" overflow, the "deserialization" causes UB. Other than that? Data is only read and new data is created. No references seem to dangle...

2 Likes

Dangling references are not nearly the only way to cause UB. Basically all such "clever" circumventions of the type system end up causing UB.

"It's only data" is also a fallacy – Rust enforces invariants depending on types. One trivial example is the use of from_utf8_unchecked. If you invoke the deserializer function with invalid (non-UTF-8) contents, that will cause instant UB.

Sure, the utf8 thing is unsound, I missed that (thats the reason I dont write unsafe code). But, not all "circumventions" are UB. The u32 <-> [u8;4] conversion is explicitly supported. The creation of &[u8] from *const u8 is explicitly supported (given aliasing and initialization rules are upheld, probably also some other rules).

The sad thing here is: This code does not need to be unsafe!. The "serialization" is safe as-is. The "deserialization" could work as let sj = str::from_utf8(&bin[ix+8..ix+8+n]).unwrap().to_owned();. The "lengths" can use u32::from_le_bytes

3 Likes

No, technically not of course. But it is empirically the case that someone who doesn't even understand the concept of alignment will get nontrivial uses of unsafe code wrong, including multiple issues in this very example. I don't know how much worse it would have to be. "It's not always wrong" is not a great standard.

2 Likes

Of course. Given the actual behavior, there is indeed little room to do worse UB-wise.
Maybe this boils down to expectation: After reading the code for one minute, I expected use-after-frees, dangling references, aliasing-rule violations, .... In particular, I expected some zero-copy attempts of deserialisation by using something similar to String::from_raw_parts on input data. I was surprised to find this was not the case.

2 Likes

Haha, thanks for all the enthusiastic help. I'm really not a professional programmer, and I didn't even continue to learn rust in depth after I wrote the function that worked properly.
Your replies have helped me a lot and I have modified the function to work properly. Thanks again. :grinning:

unsafe fn bin2map(bin: &Vec<u8>) -> HashMap<String, String> {
	let mut th = HashMap::default();
	let mut pr = bin.as_ptr();
	let end = pr.add(bin.len());

	while pr < end
	{
		let i = u32::from_le(std::ptr::read_unaligned(pr as *const u32)) as usize;
		let n = u32::from_le(std::ptr::read_unaligned(pr.add(4) as *const u32)) as usize;
		th.insert(
		          String::from_utf8_unchecked(Vec::from(from_raw_parts(pr.add(8), i))),
		          String::from_utf8_unchecked(Vec::from(from_raw_parts(pr.add(8 + i), n))),
		);
		pr = pr.add(8 + i + n);
	}

	th
}

Please. Don't do this! <insert no god please no meme> Don't do it to yourself, or to us, or to any future users of this code!

This thread already contains several pieces of advice:

  • The utf8 issue is still there
  • With little effort, this function can be implemented safely with minimal performance impact.
  • This code does not follow coding guidelines for unsafe code. Also: where are the // SAFETY comments?
1 Like

Thank you for the reminder. I did modify a new version, but it is twice as slow as the old one.

If the bin2map() input is all from map2bin(), then I believe it is safe even with from_utf8_unchecked() (if I understand it correctly).

unsafe fn bin2map(bin: &Vec<u8>) -> HashMap<String, String> {
	let mut map = HashMap::default();
	let mut sz = 0;
	let len = bin.len();
	let buf = from_raw_parts(bin.as_ptr(), len);

	while sz < len
	{
		let i = u32::from_le_bytes(buf[sz..sz + 4].try_into().unwrap()) as usize;

		let n = u32::from_le_bytes(buf[sz + 4..sz + 8].try_into().unwrap()) as usize;
		let k = String::from_utf8_unchecked(Vec::from(&buf[sz + 8..sz + 8 + i]));
		let v = String::from_utf8_unchecked(Vec::from(&buf[sz + 8 + i..sz + 8 + i + n]));
		sz += 8 + i + n;
		map.insert(k, v);
	}
	map
}

The code is slower because it has proper bounds checks.

If i or n are too large, the new code will panic. But the old code will go ahead and read whatever random data is nearby. That nearby data could be passwords, credit card details...

That assumption should be written as a comment in the code.


Stepping back, what are you trying to achieve with this code? Stripping away all safety checks is generally not something we do with production code. Especially code that handles untrusted input, like servers or web browsers, which Rust is designed for.

If your goal is to learn Rust as it's designed, then I recommend following the advice of the other people in this thread, and rewrite the code without using any unsafe.

2 Likes

BTW, here's how I would write the deserialization function:

use std::collections::HashMap;
use std::str;

fn read_size(buf: &mut &[u8]) -> usize {
    let (head, tail) = buf.split_at(4);
    let result = u32::from_le_bytes(head.try_into().unwrap());
    *buf = tail;
    result as usize
}

fn read_str<'a>(buf: &mut &'a [u8], len: usize) -> &'a str {
    let (head, tail) = buf.split_at(len);
    let result = str::from_utf8(head).unwrap();
    *buf = tail;
    result
}

fn bin2map(mut buf: &[u8]) -> HashMap<&str, &str> {
	let mut map = HashMap::new();
	while !buf.is_empty() {
	    let i = read_size(&mut buf);
        let n = read_size(&mut buf);
        let k = read_str(&mut buf, i);
        let v = read_str(&mut buf, n);
		map.insert(k, v);
	}
	map
}

If this gets more complex, I'd consider using something like nom too, to get better error handling.

2 Likes

These two sequence/desequence functions are used to communicate with the scripting language so that it has consistent map objects. All data is trusted and the data is ensured to be complete before serialization.
Yes, I should take everyone's advice and keep modifying it. I am interested in rust, but seem to lack sufficient motivation.
Thank you for the example. However, it still uses more than twice as much time. I will continue to optimize on its basis, thanks.

Given that, is its performance really more important than its correctness?

The so-called "correctness", is there two explanations, 1, there is a chance of error. 2, absolutely no error, but wearing a "unsafe" hat.
If the latter, performance is more important than it is.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.