Is it sound to keep a pointer to the contents of an `Arc`? Is it guaranteed that they won't move?

If I retain a NonNull<T> to some value deep within the bowels of the data pointed to by an Arc, is it guaranteed that that pointer will always be valid for as long as the Arc is alive? That is, if I keep a reference to the Arc in the same struct, will it always be safe to dereference the pointer?

struct Element(i64); // pretend this is actual useful data
struct Collection {
    // in the actual code, for reasons I don't think are relevant, these cannot be merged
    data1: Box<[Element]>,
    data2: Box<[Element]>,
}

impl Collection {
   pub fn perform_some_search(&self, some_args: FooType) -> impl Iterator<Item=&Element> {
       // returns a combination of elements from data1 and data2
   }
}

// for reasons I won't go into, I need this struct to be 'static
struct SearchResults {
    collection: Arc<Collection>,
    items: Vec<NonNull<Element>>,
}

impl SearchResults {
    pub fn memorize_search_results(collection: Arc<Collection>, some_args: FooType) -> Self {
        let items = collection.perform_some_search(some_args).map(NonNull::from).collect::<Vec<_>>();
        Self {collection, items}
    }
    // note that the lifetime returned here is the lifetime of the SearchResults which is (at least
    // in theory) strictly shorter than the lifetime of the Collection, thanks to the Arc.
    // my question is, is this sufficient to guarantee soundness?
    pub fn iterate_search_results(&self) -> impl Iterator<Item=&Element> {
        self.items.iter().map(|ptr| unsafe {ptr.as_ref()})
    }
}

Is the above code sound? Since, modulo any UnsafeCell inside, the contents of an Arc cannot be mutated, is it also true that they cannot move, thus guaranteeing pointer validity for as long as the Arc is alive? Can I safely rely on that? Would wrapping the Arc in a Pin, i.e. Pin<Arc<Collection>>, do anything at all?

Can I get away with this without dreaming up some harebrained scheme like

enum WhichData {Data1, Data2}
struct Index(WhichData, usize);

impl Collection {
    pub fn perform_some_search(&self, some_args: FooType) -> impl Iterator<Item=Index> {
        // returns a combination of elements from data1 and data2
    }
    pub fn get(&self, index: Index) -> &Element {
        match index.0 {
            WhichData::Data1 => &self.data1[index.1],
            WhichData::Data2 => &self.data2[index.1],
        }
    }
}

and doing a buttload of work to rework all of my existing code to take the Index struct instead of a reference (or pointer)?

2 Likes

The general idea is sound, yes. Contents of an Arc will not be mutated (unless the arc is not shared and the mutation happens through that Arc’s API. Pin doesn’t help here, as it’s really just a transparent wrapper type which limits mutable access with a complex API contract; there ain’t no magic in Pin and it won’t make your unsafe code safer here – unless you also expose some particularly general (or implementation-detail-leaking) safe API (but then that’s API you haven’t shown us yet).


The idea of combining e.g. an Arc together with some structure containing references into data owned by it is not an uncommon demand.

To help you avoid writing the unsafe code yourself you can … well … the benefit here is effectively the ability to blame someone else, if there was a way to make is unsound anyways … so you could look into crates such as yoke.

struct Element(i64); // pretend this is actual useful data

#[derive(Yokeable)]
struct SearchResultsItems<'a>(Vec<&'a Element>);

struct Collection {
    // in the actual code, for reasons I don't think are relevant, these cannot be merged
    data1: Box<[Element]>,
    data2: Box<[Element]>,
}

struct FooType;

impl Collection {
    pub fn perform_some_search(&self, some_args: FooType) -> impl Iterator<Item = &Element> {
        [].into_iter()
    }
}

struct SearchResults {
    collection_and_items: Yoke<SearchResultsItems<'static>, Arc<Collection>>,
}

impl SearchResults {
    pub fn memorize_search_results(collection: Arc<Collection>, some_args: FooType) -> Self {
        let collection_and_items = Yoke::attach_to_cart(collection, |collection| {
            SearchResultsItems(
                collection
                    .perform_some_search(some_args)
                    .collect::<Vec<_>>(),
            )
        });
        Self {
            collection_and_items,
        }
    }

    // private helper, because it’s not really directly a field anymore
    // (would also compile as -> &Vec<&Element>)
    fn items(&self) -> &[&Element] {
        &self.collection_and_items.get().0
    }

    // demo how to access the Arc
    fn collection(&self) -> &Arc<Collection> {
        &self.collection_and_items.backing_cart()
    }

    // note that the lifetime returned here is the lifetime of the SearchResults which is (at least
    // in theory) strictly shorter than the lifetime of the Collection, thanks to the Arc.
    pub fn iterate_search_results(&self) -> impl Iterator<Item = &Element> {
        self.items().iter().copied()
    }
}
3 Likes

Something of this general approach should be able to be made to work as an architecture, yeah. The data that an Arc directly points to won't change memory address until the Arc gets moved. It won't be, like, reallocated or something.

In terms of soundness though, that's a different question. Soundness is a concept that's relative to a particular API boundary: if an API is "sound," that means that there's no possible way to use the API which would compile that would lead to undefined behavior, except if the user uses their own unsafe blocks.

It's hard, as such, to tell whether your program is illustrating an API that's "sound" because it's unclear what you would intend to be the public-facing API versus the private and fixed internals. Some points on that subject:

  • If Element is hard-coded, and avoids interior mutability, then it may be sound, yet become unsound if you make it generic over some E type parameter for the element, as that could let users do various things as have E be something that say, includes interior mutability, or is non-'static and includes references to something else.
  • You say "modulo any UnsafeCell inside"--I'm not sure what your background level of familiarity is here, but I hope you're aware that there are safe forms of interior mutability such as Mutex and more. There are even some interior mutability constructs that aren't based on UnsafeCell internally, such as ones that use an AtomicUsize which holds a heap pointer.
2 Likes

Atomics are built on UnsafeCell.

1 Like

Oh, nice. I wasn't able to tell that just by clicking the "src" link in the docs.

1 Like

The yoke crate looks interesting, but it seems to be based on the stable_deref_trait crate which has a few soundness issues and the fact that it doesn't provide a Git link on crates.io (thus providing no way for me to report or browse through issues) does not exactly fill me with confidence. That said, I can't find any immediately obvious way to use it to cause undefined behavior.

1 Like

I think this is necessary if you have a requirement to guarantee no UB and you have a self-referential struct, since Rust doesn't support that. One alternative (sorry if this is obvious to you) is to keep the Collection and search results separate, passing them as references to where they're needed.

If you choose to have a self-referential struct, could you use self_cell? Your Collection could be in the self_cell "owner" role, assuming it is immutable while the search results are being used. Your items would be in the self_cell "dependent" role and would be created using self_cell's new_xxx builder APIs by doing the search in the FnOnce callback. You can take back the Collection using the into_owner method if you need to mutate it between searches.

self_cell seems to me to be one of the safer of the self-referential approaches, because it is so constrained.

I may misunderstand your use case of course.

2 Likes

I do see a repository link on crates.io: Rust Package Registry though, which is GitHub - unicode-org/icu4x: Solving i18n for client-side and resource-constrained environments.

...which has absolutely nothing to do with the project that links to it. It seems as though the crate author put something random in because Crates wouldn't let them leave the field blank. Using a crate that is inherently unsafe and which could theoretically become unsound in a future version of rustc makes me nervous enough, but using such a crate where the author seems to have gone out of their way not to let themselves be contacted...

It's not random - that's the GitHub repo yoke lives in, and therefore the repo you should use for issues with the yoke crate.

3 Likes

There's no need to be unnecessarily sceptical of the crate authors here, when the crate on crates.io lists as owners the current team leader of one of the Rust teams; as well a team (named the same as the linked repository, by the way) that's part of the official Unicode Consortium organization on GitHub.

1 Like

My bad -- I saw the icu4x repository, which looked unrelated, attached to a crate that did not obviously have anything to do with Unicode and jumped to conclusions. My apologies.

1 Like

No worries. It’s right to be skeptical about unsafe-involving infrastructure crates.


FYI, stable_deref_trait isn’t too bad; e.g. yoke does a best-effort approach of working around the aliasing issues by wrapping the owner in a counter-measure – currently a MaybeUninit, but tracking efforts to bring even more proper workarounds into Rust.

And the fact that stable_deref_trait was originally created for crates like owning-ref (which has soundness holes – but most of them are really only because it’s unmaintained, so they were never fixed) or rental (which is officially deprecated with a warning of being unmaintained in the crate's repo and crates.io-page) doesn’t actually imply any problems with stable_deref_trait itself.

There is also ouroboros - Rust

It has had a history of soundness issues that then get fixed though. This really is something that the language should provide proper support for.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.