Unsafe code review request with UnsafeCell

ramnivas · December 12, 2022, 10:46pm

This is a follow up from Unsafe code review request - #8 by ramnivas.

The only difference is the use of UnsafeCell to make changes behind a shared reference following the suggestion by @2e71828). As far as understand, the use of UnsafeCell indicates to the optimizer to not assume that a shared reference is immutable (which was the issue with my original code).

From the UnsafeCell docs:

If you have a reference &T , then normally in Rust the compiler performs optimizations based on the knowledge that &T points to immutable data. ... UnsafeCell<T> opts-out of the immutability guarantee for &T : a shared reference &UnsafeCell<T> may point to data that is being mutated.

(There is an additional change to introduce types() following @quinedot 's suggestion, but that is probably orthogonal to the core issue).

However, running cargo +nightly miri test still gives me the same error. Although, it hedges (with the original and new code): "but the Stacked Borrows rules it violated are still experimental", this error gives me a pause that there is still a UB lurking there.

So is my code safe?

Here it the transform function (the rest of the code is in playground:

pub fn transform(arena_system: ArenaSystem) -> System<'static> {
    // First create shallow types (i.e. type with no fields), since we may not
    // have created a type that a field refers to.
    let (resolved_types, arena_types_fields): (Vec<_>, Vec<_>) = arena_system
        .types
        .data
        .into_iter()
        .map(|arena_type| {
            (
                UnsafeCell::new(Type {
                    name: arena_type.name,
                    fields: Vec::new(),
                }),
                arena_type.fields,
            )
        })
        .unzip();

    // Now fill in the fields
    for (resolved_type, arena_type_fields) in
        resolved_types.iter().zip(arena_types_fields.into_iter())
    {
        let resolved_fields: Vec<_> = arena_type_fields
            .into_iter()
            .map(|field| {
                // SAFETY: Getting an immutable reference to the content inside UnsafeCell
                let field_type = unsafe { &*resolved_types.get(field.typ).unwrap().get() };
                Field {
                    name: field.name,
                    typ: field_type,
                }
            })
            .collect();

        // SAFETY: All the resolved types are placed in the `resolved_types`
        // vector and will not move ever (we don't need to resize
        // `resolved_types` ever). Furthermore, we place `resolved_types` in the
        // `System` struct, which is immutable (since the `types` field is
        // private), so nothing will get added to `types` in it.
        let resolved_type: &mut Type = unsafe { &mut *resolved_type.get() };
        resolved_type.fields = resolved_fields;
    }

    System {
        types: resolved_types.into_iter().map(|t| t.into_inner()).collect(),
    }
}

CAD97 · December 13, 2022, 12:15am

(playground miri compatible version)

You're not using UnsafeCell correctly. Once you create &T, the T is frozen and can only be used by-ref. It doesn't matter if it came from &UnsafeCell<T> initially; the &T being live^[1] means that you have to play by the &T rules.

UnsafeCell<T> is, very roughly speaking, an unchecked RefCell<T>. While &T is live, you cannot have a live &mut T. When one &mut T is live, you cannot have another live &mut T.

When you create field_type and store it in Field.typ, you're promising that the Type will remain unmodified while the Field.typ reference is still usable. When you modify the Type.fields field, you're breaking that promise, thus the UB that miri is diagnosing.

It is fundamentally not possible to construct the self-referential structure you're trying to make with the shape of

struct Container<'this> {
    pub items: Vec<Item<'this>>,
}

struct Item<'this> {
    pub container: &'this Container<'this>,
}

where the Item has a plain reference to the Container which owns it. This is because it is impossible to add an Item to the Container's list once an Item exists and holds a lock promising the Container to not be modified any longer.

You're going to have to compromise on the public API and change something, because somewhere inside Type you need to introduce an UnsafeCell. Try creating a version using RefCell (or RwLock; replace &UnsafeCell<T>⇔&RefCell<T>, &'_ T⇔Ref<'_, T>, and &'_ mut T⇔RefMut<'_, T>); if that version runs successfully, then the same run with UnsafeCell would not contain any UB.

(Caveat: it might be valid to construct this in a two-step process, where during construction Item.container: &'this UnsafeCell<Container<'this>>, and then after the structure is created you transmute to a layout-compatible version which removes the UnsafeCell layers. I'm not confident saying that's necessarily true, though.)

I think the structure you're creating is theoretically sound, since once constructed everything is behind &'static, just not the way you're constructing it. The smallest diff is to store &UnsafeCell<Type<'_>> in Field instead, and expose a function getter which resolves this to &Type<'_>, with the soundness reasoning that at that point the Type is frozen and will no longer be modified.

HOWEVER, you still have an issue to resolve during construction. When you resolved_types.into_iter(), you're moving the Types in order to remove the UnsafeCell wrapper, which again invalidates any references to their old location. The Vec.into_iter().map().collect() happens to end up reusing the same allocation so the Types don't get relocated in memory, but there's no such guarantee, and they could be collected to a completely unrelated memory location, clearly invalidating the references to the previous location.

You can sidestep this issue by just storing Vec<UnsafeCell<Type>> in System instead of Vec<Type>. Once you've taken the unsafe cudgel to the remaining lifetime errors, your tests pass Miri. HOWEVER, this still doesn't mean the code is actually sound. Despite Box having a "stable deref" and moving the Box not moving the pointee, as currently implemented by both Rustc and Miri, moving a Box will invalidate any pointers to its pointee. This doesn't currently occur for Vec, but it theoretically could, so properly future-proof code should be written assuming that moving Vec invalidates any pointers into the Vec.

If you're constructing a graph-like structure, you really will run into fewer issues if you just use raw pointers, because then you don't have extra requirements being asserted whenever you use the safe types. You're on the hook for writing C-style code managing the raw pointers, and still on the hook for the reference rules when you do create references, but it's much easier to track when this occurs than when mixing unsafely extended reference lifetimes with safe abstractions.

Alternatively, a potentially better design is "just" to remove the parent references. Instead of trying to store the references, track them when accessing the field, e.g. a completely safe design along the lines of

struct System {
    types: Vec<TypeInfo>,
}

struct TypeInfo {
    name: String,
    fields: Vec<FieldInfo>,
}

struct FieldInfo {
    name: String,
}

#[derive(Copy, Clone)]
struct Type<'a> {
    info: &'a TypeInfo,
}

impl<'a> Type<'a> {
    fn name(self) -> &'a str { &self.info.name }
    fn types(self) -> impl 'a + Iterator<Item=Field<'a>> {
        self.info.fields.iter().map(move |info| Field { ty: self, info })
    }
}

#[derive(Copy, Clone)]
struct Field<'a> {
    ty: Type<'a>,
    info: &'a FieldInfo,
}

impl<'a> Field<'a> {
    fn ty(self) -> Type<'a> { self.ty }
    fn name(self) -> &'a str { &self.info.name }
}

Given what you're doing seems to be similar, it might be worth perusing how bevy_reflect structures its RTTI structures.

What it means for a reference to be "live" is the most undecided part of the semantics. At a minimum, references are "live" from the point in time they're created to the point in time they're last dereferenced. ↩︎

ramnivas · December 13, 2022, 2:22am

@CAD97 Thank you so much for such a detailed and well-reasoned response. I now understand UnsafeCell a bit better and see the issue with my implementation.

I will explore the possibilities you mentioned. I suspect that I will end up taking the pointer-based approach. In a sense, what I am looking for is a way to mark some references to be left alone by optimizer and using pointers may be that way.

For a bit of context: Currently, I am using ArenaSystem directly. But every part of code that uses any of the ArenaX types, needs access to the ArenaSystem instance and leads to somewhat awkward API. In effect, I am using "pointers" (disguised as indices into arenas), so may be using real pointer will be okay.

I will also examine how bevy_reflect deals with this issue.

VorfeedCanal · December 13, 2022, 12:13pm

The rule is: don't lie to the compiler. Ever. Don't try to do “simple white lie” behind its back.

It would find a way to expose these lies and punish you sooner or later.

Switching to pointers and accepting the fact that you are now in charge of resolving the lifetime puzzles is perfectly valid (if dangerous and painful) solution, though.

ramnivas · December 13, 2022, 6:24pm

The issue is what constitutes a "lie". In effect, every use of unsafe is a potential lie and that is when you summon "miri" to "fact check". Of course, miri has limitations. First, it only checks what you test for and like any testing it will not help with scenarios that are possible but not yet tested. Second, it hedges with "but the Stacked Borrows rules it violated are still experimental" which makes you think may be this usage is okay. So then you ends up asking here and tap into others expertise (related: people here are so nice and patient. Nice to be a part of this community). But this, of course, doesn't scale.

In this case, it feels like a possible future Rust improvement. If only it allowed me to indicate to avoid certain optimizations, it might have been enough. Of course, as you can tell, I am new to unsafe Rust, so there is very likely several holes in my understanding. But what if Rust allowed, say:

pub fn transform(arena_system: ArenaSystem) -> System<'static> {
   unsafe(suspend-reference-immutability-optimization) {
   
      ... code from my original attempt or the new
 
   }
}

or, even narrower to certain references:

pub fn transform(arena_system: ArenaSystem) -> System<'static> {
   ...

   unsafe(suspend-reference-immutability-optimization(Type.fields)) {
      // Now fill in the fields
      for (resolved_type, arena_type_fields) in
        resolved_types.iter().zip(arena_types_fields.into_iter()) {
        ...
      }
   }

   ...
}

Then I could opt-in C pointer-like behavior of a narrower portion of code and express explicitly what kind of unsafe behavior I am opting into.

VorfeedCanal · December 13, 2022, 6:45pm

No, that doesn't work. There are just too many possible optimizations and they are changing regularly.

Blacklist wouldn't work, that would only make problem worse. You have to describe what invariant your code may be violating instead.

That one is already exist. It's called UnsafeCell.

By “C pointer-like behavior” you mean PNVI-ae-udi, and TBAA right? Right? Do you even understand what the phrases provenance-not-via-integer exposed-address user-disambiguation or type-based aliasing analysis even mean?

I strongly suspect that what you long after are “pointers-as-just-addresses” and these are not an option even in C.

That's precisely why this:

Is a problem. We don't know what are the exact rules which we have to follow in the unsafe code. Simply because it's not a question with known answer in C/C++-land (and all existing Rust compilers are using C/C++ compilers as backends).

Thus yes, it's hard. But that doesn't mean you may just go and ignore the rules. Rather you have to try to uphold them to the best of one's abilities.

alice · December 13, 2022, 7:06pm

When it comes to the topic of unsafe code involving pointers that are stored long term, here are some tips: (I will use "pointer" to refer to both "raw pointer" and "reference")

If you have two pointers A and B where B is derived from A, then nothing you can do with B can possibly invalidate A. (other than freeing the memory)
If you have two raw pointers A and B such that one is derived from the other using only raw pointer operations (i.e. no going through references), then they are both considered the same pointer for the purposes of rule 1.

This has some interesting implications. For example, if your raw pointer is the original raw pointer you got from the memory allocator (or derived from it via only raw ptr ops), then all other pointers into the allocation are derived from it. Thus, the "original" raw pointer can never be invalidated by the actions of any other pointer. (Other than freeing the memory of course.)

This reveals a strategy for writing any kind of struct that stores raw pointers: Make sure that any raw pointer you're storing is directly derived from the original raw pointer. If you do that, the raw pointer is definitely valid until you free the memory.

ramnivas · December 13, 2022, 7:39pm

That's seems very useful framework to think about. Thank you!

afetisov · December 14, 2022, 1:59am

Imho that's a good thing and generally OK. Arena allocations are a tricky thing, and it's useful to see explicitly the places which use them. But if you really use the same arena everywhere, you can consider turning it into a global. Use a thread_local! macro for a thread-local arena, or make it a static if the arena is thread-safe.

I would also suggest you to search for existing arena allocator crates. Writing one correctly is really tricky business, and you may also find useful more powerful primitives, perhaps even a GC. It's true that asking questions on URLO doesn't scale, but using a safe existing crate does!

I would urge you to avoid writing unsafe code until you're really comfortable with Rust and its rules. At minimum, you should read the Rustonomicon.

They are experimental, there were many historic issues where the rules ended up being changed, and there are still some dark corners. But at this point most of the rules are well-tested and pretty stable, so the default assumption should be "miri is right".

ramnivas · December 14, 2022, 2:54am

In my actual system, I use typed_generational_arena.

I did at some point, considered using a global/thread-local, but from my experience, such an usage can become quite limiting (essentially precludes possibilities such as using multiple of those, running tests in parallel, etc). So for now, arena even with some awkward code, feels like a better approach.

All this exploration is, in part, to improve my understanding of unsafe Rust and one way is by actually trying different approaches. I don't intent to use this kind of code in my real system--until I am well satisfied that it doesn't have UBs (so definitely, at least, miri should be happy with it).

parasyte · December 14, 2022, 3:33am

I think it is also worth discussing unsafe in its role as an escape hatch for things that otherwise are not possible. You wouldn't want to open an emergency exit that sets off an alarm in a building just because it's interesting, would you?

In other words, while there is some merit to exploring unsafe Rust for educational purposes, you're probably not going to need it. And it's way too easy to get wrong, so that should be some motivation to avoid it until you actually need to escape.

VorfeedCanal · December 14, 2022, 3:50am

You absolutely do need it simply because much of standard library is not possible to write in safe Rust.

Just please don't mix unsafe code with business-requested code: businesses are known to change their requirements often thus it's much better to have unsafe-based abstractions in a separate crate (or at least separate module) with clear and sound interface.

Then you can afford to write unsafe code once and use it forever (or at least for a few years) without change.

parasyte · December 14, 2022, 3:56am

I'm not talking about a transitive need; I'm talking about actually using the keyword directly in whatever code you end up writing. It is almost always better to use an existing crate that encapsulates unsafe (the idea being that it might actually be well tested).

system · March 14, 2023, 3:56am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
UnsafeCell: what exactly does it do, and when do I need it? help	5	3834	January 12, 2023
UnsafeCell behavior details help	6	910	January 12, 2023
Want to be clear, an UnsafeCell will disable optimizations on the whole composite tree?e help	7	611	October 5, 2021
Unsafe code review request	11	628	June 16, 2022
Is this a safe use of UnsafeCell? help	14	1538	January 12, 2023

Unsafe code review request with UnsafeCell

Related Topics