How to avoid write almost the same code due to borrow checking

I met a situation where I don't know how to avoid using unsafe and not write almost the same code twice.

Here we have an enum named Tree:

use std::collections::HashMap;
enum Tree {
    KV(HashMap<String, Tree>),
    Value(String),
}

Now we want to define a function which could find Tree by path splitted with ".".

impl Tree {
    fn find_element(&self, path: &str) -> Option<&Tree> { todo!() }
    fn find_element_mut(&mut self, path: &str) -> Option<&mut Tree> { todo!() }
}

For example:

let ele: &Tree = tree.find_element("find.element.by.path").unwrap();

I wrote find_element as follow:

impl Tree {
    fn find_element(&self, path: &str) -> Option<&Tree> {
        let mut prv = self;
        assert!(path.is_ascii());
        for name in path.split(".") {
            match prv {
                Tree::KV(map) => match map.get(name) {
                    Some(tree) => prv = tree,
                    None => return None,
                },
                Tree::Value(s) => return None,
            }
        }
        Some(prv)
    }
}

Now here comes the problem: How to write find_element_mut?

I have two solutions. The first one is to copy the code and edit it, so we got this:

impl Tree {
    fn find_element(&self, path: &str) -> Option<&Tree> {
        let mut prv = self;
        assert!(path.is_ascii());
        for name in path.split(".") {
            match prv {
                Tree::KV(map) => match map.get(name) {
                    Some(tree) => prv = tree,
                    None => return None,
                },
                Tree::Value(s) => return None,
            }
        }
        Some(prv)
    }

    fn find_element_mut(&mut self, path: &str) -> Option<&mut Tree> {
        let mut prv = self;
        assert!(path.is_ascii());
        for name in path.split(".") {
            match prv {
                Tree::KV(map) => match map.get_mut(name) {
                    Some(tree) => prv = tree,
                    None => return None,
                },
                Tree::Value(s) => return None,
            }
        }
        Some(prv)
    }
}

The second solution is to use unsafe to avoid writing duplicate code:

impl Tree {
    unsafe fn _find_element(slf: *const Tree, path: &str) -> Option<*const Tree> {
        let mut prv = &*slf;
        assert!(path.is_ascii());
        for name in path.split(".") {
            match prv {
                Tree::KV(map) => match map.get(name) {
                    Some(tree) => prv = tree,
                    None => return None,
                },
                Tree::Value(s) => return None,
            }
        }
        Some(prv)
    }

    fn find_element(&self, path: &str) -> Option<&Tree> {
        unsafe { Self::_find_element(self, path).map(|e| &*e) }
    }

    fn find_element_mut(&mut self, path: &str) -> Option<&mut Tree> {
        unsafe { Self::_find_element(self, path).map(|e| &mut *(e as *mut Tree)) }
    }
}

The above two solutions both seem to be not satisfying. Does anybody have any good ideas?

The unsafe option you list also appears unsound to me (i.e. using it causes UB). You’re converting prv: &Tree into *const Tree, then into &mut Tree.

So the only reasonable option (of the two you list) is to use code duplication. Another alternative (not necessary all that satisfying either though) would be to use a macro, e.g.

macro_rules! find_element_implementation {
    ($find_element:ident $get:ident $($mut:ident)?) => {
        fn $find_element(&$($mut)? self, path: &str) -> Option<&$($mut)? Tree> {
            let mut prv = self;
            assert!(path.is_ascii());
            for name in path.split(".") {
                match prv {
                    Tree::KV(map) => match map.$get(name) {
                        Some(tree) => prv = tree,
                        None => return None,
                    },
                    Tree::Value(s) => return None,
                }
            }
            Some(prv)
        }
    }
}

impl Tree {
    find_element_implementation!(find_element get);
    find_element_implementation!(find_element_mut get_mut mut);
}
2 Likes

Sorry I don't get this... Why are the conversions UB?
I think it may be better if Rust could have a syntax to solve this problem?

Transmutes - The Rustonomicon

  • Transmuting an & to &mut is UB.
    • Transmuting an & to &mut is always UB.
    • No you can't do it.
    • No you're not special.

(this rule applies to any form of converting a &T into &mut T (with the same target (and I’m not 100% sure about ZSTs)), not just literal calls to mem::transmute)


If you want more convincing “arguments”, try running an invocation of the unsafely (and AFAICT unsoundly) implemented find_element_mut method through miri (this might possibly need MIRIFLAGS="-Zmiri-track-raw-pointers")

1 Like

I read the link you gave but I still don't know why converting &T to &mut T is UB. Will the compiler generate different assembly code for find_element and find_element_mut? (I'm not familiar with assembly)

It is a fundamental assumption in Rust that you can't mutate the value behind a &T reference without some form of "interior mutability" to manage synchronisation and aliasing (e.g. a Mutex). Think of it like me asking "what is the result of 0/0?" in mathematics or a C programmer passing 3 arguments to a function that expects 4.

The Rust language provides the guarantee that data will be shared XOR mutable, which allows programmers and compilers to create arbitrarily complex programs with that assumption in mind. If your code uses unsafe to violate that guarantee then anything building on your code will be fundamentally broken. This is the same for all languages, although Rust puts a much greater emphasis on soundness/UB than most.

Undefined behaviour is not about how code gets optimised or what the hardware does.

2 Likes

I guess you may not understand me. In the above solution that using unsafe, I guaranteed the uniqueness of mutable reference manually.
In other words, is this function also UB?

fn foo<T>(t:&mut T)->&mut T{
    unsafe{
        let r=&*t;
        &mut *(r as *const T as *mut T)
    }
}

Yes

fn foo<T>(t:&mut T)->&mut T{
    unsafe{
        let r=&*t;
        &mut *(r as *const T as *mut T)
    }
}

fn main() {
    foo(&mut 0);
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.19s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: trying to reborrow for Unique at alloc993, but parent tag <untagged> does not have an appropriate item in the borrow stack
 --> src/main.rs:4:9
  |
4 |         &mut *(r as *const T as *mut T)
  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trying to reborrow for Unique at alloc993, but parent tag <untagged> does not have an appropriate item in the borrow stack
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
          
  = note: inside `foo::<i32>` at src/main.rs:4:9
note: inside `main` at src/main.rs:9:5
 --> src/main.rs:9:5
  |
9 |     foo(&mut 0);
  |     ^^^^^^^^^^^
  = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
  = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:122:18
  = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
  = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
  = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:492:40
  = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:456:19
  = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:137:14
  = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
  = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:492:40
  = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:456:19
  = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:137:14
  = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
  = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:144:17

error: aborting due to previous error
1 Like

For context, the error message shown above was generated by Miri, a Rust interpreter which can detect a lot (but not all) cases of UB.

You can run Miri on the playground via the Tools dropdown:

image

2 Likes

It doesn't matter that you use &mut to ensure uniqueness at the top level because Rust requires each individual operation to be sound, regardless of the wider context. In this case you have a line that uses pointer casting to turn a &T into a &mut T, which is UB.

Please understand that the issue at hand here is soundness, not completeness. Soundness only requires that if a program is wrong, it doesn't compile, but correct programs may also not compile. Programming in this sense is like a conversation with the compiler, who can only be convinced by a certain logic, regardless if you can prove your program to be correct by some different logic.

Despite being a very intuitive ask, generalizing over the multiplicity of a type (value, &mut, and &) is currently an open research question. Please take recommendation of current workarounds as pragmatic suggestions that can solve your problem right now, not as our complacency with them. You are welcome to track that progress at the issue rfcs#414. (It's a problem hard enough that the issue was created way back then but still without a draft RFC.)

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.