I was talking to a friend today about using Arc<[u8]> instead of Arc<Vec<u8>> and we went down the rabbit hole of how this works under the hood.
It's clear that this will always allocate new memory for the data, copy the Vec's data to the new mem, and deallocate old memory. We were trying to figure out if there's a way to reasonably reuse the Vec's data pointer and eliminate copying the data (and at this time this is just a thought experiment -- not something actually killing perf).
Note, I haven't dug deep into the inner-workings of Arc, but I believe that the root of the issue is that the layout of ArcInner obviously is not the same as the layout of Vec's data. At the time of writing, ArcInner is:
#[repr(C, align(2))]
struct ArcInner<T: ?Sized> {
strong: Atomic<usize>,
// the value usize::MAX acts as a sentinel for temporarily "locking" the
// ability to upgrade weak pointers or downgrade strong ones; this is used
// to avoid races in `make_mut` and `get_mut`.
weak: Atomic<usize>,
data: T,
}
So even if the Vec had enough unused capacity to accommodate the strong/weak pointers, a copy would still be needed to push the data forward and the fields cannot be reorganized because of the ?Sized constraint and the pointer arithmetic to reach the strong/weak counts would slow things down.
I suppose that the pointer to the data could be stored instead, but now you're chasing pointers which outside of this niche case might hurt perf more overall.
This is all a long way of asking if there are any practical means of eliminating this copy specifically for From<Vec<T>> or at least the allocation? Or does this just unnecessarily complicate code for one particular optimization path which may be negated by side effects of re-architecting specifically for the optimization (pointer chasing)?
Yeah, this is a classic tradeoff. C++'s shared_ptr<T> stores two pointers so that the counts and the item can be in different allocations, whereas Arc stores one pointer and thus requires a single allocation. Whether the increased sizeof(shared_ptr) is a net win is unclear -- it'd be handy for this case, but wasteful in many others.
I think the best answer might be the oft-discussed "attempt to reallocate in place" API, so it could try to just extend (and possibly align-higher) the allocation, allowing it to just be memmoved, but if not do the allocate-and-copy. That just doesn't exist (yet).
…I think with an Allocator that’s wrapping Global by “always allocating a chunk of memory that’s max(2*mem::size_of::<usize>(), mem::align_of::<T>()) larger than requested, and sufficiently aligned for usize, and returns a pointer past that extra memory” – let’s call it “RcAllocator” – you could turn a Vec<T, RcAllocator> into an Arc<[T], Global> without copying, because the required extra space for the ref counts is already there then.
If that is the shape of your problem, have you considered using Bytes instead? It's a special-built structure which uses refcounting to cheaply share buffers. In fact, it can share even subslices from the same buffer, without lifetimes!
I don't have enough details about his situation to say if it's worthwhile for him. I know that he's trying to keep the dependency tree fairly trim though.
Come to think of it, thin_vec - Rust stores its length and capacity inside the allocation, so if alloc included that, it could offer almost-free UniqueArc<[T]> -> ThinVec<T> always as well as non-reallocating ThinVec<T> -> Arc<[T]> if the capacity matches.
(With them in separate crates I don't know that either would want to offer enough stable layout guarantees to let the other implement it.)
I've wanted a UniqueArcVec<T> that's basically UniqueArc<[T]> plus spare capacity where it has basically all the methods of Vec<T> as well as having zero-cost conversion to UniqueArc<[T]> if there's no spare capacity -- just like Box<[T]>: From<Vec<T>>.
It'd be possible to store the counters at the end together with a negative offset and adding some padding to ensure alignment.
But that'd conflict with another desirable optimization: having the &Arc be the pointer data.