Unsafe code review request

I am trying to create a graph structure (for a type system). Specifically, I have a structure that uses arena-based indices to represent the graph and I want to transform that into one that uses direct references (so that I don't need an additional arena argument everywhere to resolve those indices to the actual objects).

I am using unsafe in one place, and it all seems to work fine, but I am not sure if it is safe. Miri does give an undefined behavior error. I see that the Nomicon says "Transmuting an & to &mut is UB." (in the context of mem::transmute, but probably applies to casts as I have done here).

I believe that the way I am using external immutability, it wouldn't be possible to encounter UB. Miri errors make me nervous, but I am not sure what is the failure scenario in my code? Is my unsafe code really unsafe? Is there a usage example, where this would lead to an actual memory or other error?

The full code along with tests is in playground.

Here is the code that does uses unsafe:

pub fn transform(arena_system: ArenaSystem) -> System<'static> {
    // First create shallow types (i.e. type with no fields), since we may not
    // have created a type that a field refers to.
    let (resolved_types, arena_types_fields): (Vec<_>, Vec<_>) = arena_system
        .types
        .data
        .into_iter()
        .map(|arena_type| {
            (
                Type {
                    name: arena_type.name,
                    fields: Vec::new(),
                },
                arena_type.fields,
            )
        })
        .unzip();

    // Now fill in the fields
    for (resolved_type, arena_type_fields) in
        resolved_types.iter().zip(arena_types_fields.into_iter())
    {
        let resolved_fields: Vec<_> = arena_type_fields
            .into_iter()
            .map(|field| {
                let field_type = resolved_types.get(field.typ).unwrap();
                Field {
                    name: field.name,
                    typ: field_type,
                }
            })
            .collect();

        // SAFETY: All the resolved types are placed in the `resolved_types`
        // vector and will not move ever (we don't need to resize
        // `resolved_types` ever). Furthermore, we place `resolved_types` in the
        // `System` struct, which is immutable (since the `types` field is
        // private), so nothing will get added to `types` in it.
        #[allow(clippy::cast_ref_to_mut)]
        let resolved_type: &mut Type = unsafe { &mut *(resolved_type as *const Type as *mut Type) };
        resolved_type.fields = resolved_fields;
    }

    System {
        types: resolved_types,
    }
}

The following structures use my own IdArena to make it work in playground (the real code uses one from id_arena for type safety):

pub struct ArenaSystem {
    pub types: Arena<ArenaType>,
}

pub struct ArenaType {
    pub name: String,
    pub fields: Vec<ArenaField>,
}

pub struct ArenaField {
    pub name: String,
    pub typ: Id<ArenaType>,
}

And here are structures that use direct references:

pub struct System<'system> {
    types: Vec<Type<'system>>,
}

#[derive(Debug)]
pub struct Type<'system> {
    pub name: String,
    pub fields: Vec<Field<'system>>,
}

pub struct Field<'system> {
    pub name: String,
    pub typ: &'system Type<'system>,
}

impl<'system> Debug for Field<'system> {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}: {}", self.name, self.typ.name)
    }
}

I’m on mobile so can’t do a thorough review, but translating & to &mut is UB no matter how you do it. In particular, the optimizer is allowed to assume that the referent of a shared reference doesn’t change within the reference’s lifetime and reorder/merge accesses accordingly.

The only exception to this is UnsafeCell, which you’ll need to use if you want to make changes behind a shared reference.

1 Like

From that same link:

  • Transmuting an & to &mut is UB.
    • Transmuting an & to &mut is always UB.
    • No you can't do it.
    • No you're not special.

If Miri says the code is incorrect, the code is near-certainly incorrect. It doesn't really have false-positives


&T to &mut T is defined to be undefined, no matter what actually happens if you do it. Undefined behavior is kinda like dividing by zero; if you do it, anything can happen, even impossible things, so it's just not allowed.

Thanks. I didn't consider the optimization angle. I will give UnsafeCell a try.

I understand all this as a general concept, but was trying to get a specific failure scenario (@2e71828's reply provided one).

@2e71828 I have made a PR Add an explanation shared to exclusive transmute by ramnivas · Pull Request #344 · rust-lang/nomicon · GitHub based on your reply (don't know you Github handle to give you credit there, but added this thread as context).

2 Likes

Incidentally, note that it is about more than optimization, it is also about semantics. &mut more accurately indicates unique access, and one can rely on this, even in unsafe, to avoid data races [1], even in the absence of any optimization whatsoever.

It's a core property of the language.


  1. (for example; or any other UB, or logical invariants) ↩︎

1 Like

In absence of optimization, in my particular example (locally controlled mutability along with external immutable access), I am unable to imagine a scenario where the program will exhibit an error. The transform function consumes arena_system and returns a newly created System, with no externally observable mutability. Furthermore, both ArenaSystem and System both own all its content, so no external references to worry about.

This is different from cases such as this example that takes an external shared reference with an unknown origin and mutates it. So a crash is easy to understand.

I am just trying to understand this more fully. While the optimizer assumption reason is already good enough to avoid such usage, if there are more demonstrable issues with my example, it will be good to know.

Well optimizations are the main way where this kind of thing can go wrong.

3 Likes

I have no demonstration specific to your example, I just wanted to clarify, given your documentation PR. Mutex::get_mut not locking is an example unrelated to yours.

More generally, UB is a language-level definition, and if your program has UB (like casting a & to a &mut in Rust), it may not compile how you intended. Turning it around, a program compiling how you intended is not an indication that there is no UB. Maybe there's no UB, or maybe you got lucky.

That's just another clarification as your OP seems to use "encouter UB" to mean "get a compilation I didn't intend." There tends to be an increased emphasis on avoiding UB entirely in Rust (even in cases that happen to compile as intended today).


Separate from any of that, you'll want to be sure that you don't hand out handles to Type<'_> or Field<'_> without capping their lifetimes to something tied to a borrow of the System<'static>, e.g.

impl System<'static> {
    // Lifetime of revealed innards is constrained by the borrow to &self
    pub fn types(&self) -> &[Type<'_>] {
        &self.types
    }
}

Or else a user of the crate could hold on to a &'static and then drop the System, say.

I do suspect you know this, given the design and privacy I see in the structs. But you didn't call it out either, so I'm mentioning it.

2 Likes

Thanks @quinedot for the explanation.