Unsafe Rust is... tough?
While I was practicing rust, I stumbled upon a task to implement circular
buffer. It turned out to be really easy with VecDeque, since it somewhat
already a circular buffer. So, for educational purposes, I decided to implement
it in an unsafe way by manually managing memory. I stick to stable channel. I've
never used any unsafe features of the language before, but already familiar with
this concept from C++ so I thought it couldn't be harder in user-friendly
language, comparing to brutal corner-case hell of C++. Oh God how wrong I was
naively thinking that.
First of all I need to say I haven't read the Rustonomicon (probably I should)
before proceeding to solving this task since I have minimal understanding of
unsafe basics from other source, like reddit posts/questions, blogs dedicated
to rust, reading rust code, etc. But when I started looking for information on
how to implement such structure turned out from rust perspective I lack
knowledge so I tried to collect this information piece by piece from some online
sources.
Initial attempt was to go along with the chapter from Rustonomicon,
implementing Vec. Layout section suggests:
- You must make sure your data structure maintains covariance (by using
PhantomData<T>when appropriate). - Should drop any manually managed values.
- Implement
Send/Syncby hand if we are holing a pointer (* mut|const _)
since it inhibited by fields of such type. - Optionally, maintain null-pointer-optimization for enums with 2 variants
(e.g.size_of::<MyVec>() == size_of::<Option<MyVec>>()).
Also, about last clause book says that you can't do it in stable rust! How come
rust advertise itself as capable to be low level as C++, but such easy thing is
unavailable? After some digging I've found NonNull type from standard library,
which is a pointer wrapper which purpose is solely to enable this concrete
optimization, should I assume that Rustonomicon is out of date with current rust
state?
By reading further chapter suggest you to never allocate above isize::MAX
items since GEP instruction from LLVM IR (primary backend of rustc) accepts
signed integer as offset. As a consequence method
offset from pointer type accepts an isize, few methods from Vec
documentations also constrain you to capacity <= isize::MAX. This is a really
deep detail of implementation exposed and carved into standard library API, I
think it was a compromise of some kind to do so. On the other hand, allocator
interface completely lacks knowledge of this detail and allows you to allocate
usize bytes. It looks like you can cast pointers to unsigned integer (usize)
directly and work your way out with integer arithmetic taking in consideration
data available in Layout with a price of some valuable
optimizations. size_of<T> is already padded to alignment and ready to be used
as offset between array elements, according to reference.
Turning back to Vec implementation chapter, it suggests to use Unique<T> which
is internal detail of standard library and available only (by the time of
writing this) in nightly by enabling a feature gate. This is disappointing since
my goal was to write implementation for stable rust. So I abandoned this chapter
and proceed to searching further.
Closest to C/C++ thing was std::alloc::alloc, so
straightforward way appeared to me was to just use it. Group of free functions
from std::alloc module allows you to work with global allocator,
chosen by the user of your library with #[global_allocator] attribute, same
allocator used by Box, Vec, etc. Same suggestions from Rustonomicon chapter
applied here (covariance, manual drop, Send/Sync,
null-pointer-optimization). Covariance and NPO you can get from NonNull, just
need to make sure you pass non-zero allocation size (use
NonNull::dangling() as a placeholder for well-behaved
non-zero unknown value). Continuous elements layout can be build with
Layout::array(usize).
I noticed that Vec documentation suggests that you can use
vector as a base for your custom data structure, but information on this topic
is obscure and scattered through documentation. Doing so would be nice and can
lift some burden discussed above. My attempt was to maintain a Vec inside my
data structure. You can construct a Vec<T>
with custom capacity and if size_of<T> != 0 produced
vector has exact capacity. My guess is that I can just obtain a pointer to the
internal memory by as_mut_ptr and manage it by hands. But
some parts of documentation hints that you must use
from_raw_parts, so as consequence you need to
deconstruct the vector into ptr-len-capacity triplet? Also, vector has no
guarantees about maintaining it's uninitialized memory. The only promise it
has is that you can initialize memory beyond len() and then adjust length
accordingly and it will be sound to do so. But can you just use available memory
without modifying the vector by obtaining access through as_mut_ptr method?
Unfortunately, signature suggest it may modify internal state and allocated
memory. So looks like salvaging the vector is the most sane thing to do. But we
achieve not much with such approach. GEP problem still persist since we tied to
use offset with isize value. Neither there is any info if value returned
from as_mut_ptr is never null. You can be sure value held by a vector is never
null, but does this method perform any transformation from internal
representation?
The last step is to manually drop initialized values. It turned out to be
non-trivial too. First of all, if you stare carefully on Vec documentation it
says no particular order of dropping guaranteed. Other collections just
completely silent about it.
Another problem are panics during dropping. From my experience, it is
unreasonable and bad to panic during destruction of a value. However,
drop does NOT prohibit such behavior. The only caveat is if a
value has panicked mid-drop, it is considered dropped, so you must not drop more
than once.
So if you drop something generic, it may panic, and you should be aware of it.
I've peeked into VecDeque source "how do they do it":
struct Dropper<'a, T>(&'a mut [T]);
impl<'a, T> Drop for Dropper<'a, T> {
fn drop(&mut self) {
unsafe {
ptr::drop_in_place(self.0);
}
}
}
let (front, back) = self.as_mut_slices();
unsafe {
let _back_dropper = Dropper(back);
// use drop for [T]
ptr::drop_in_place(front);
}
Dropping performed by using drop_in_place. Since VecDeque
consist of two allocation units, it performs it twice. But here is the tricky
part: Dropper's purpose is to make sure other part was dropped if there was a
panic during dropping first part. But what happens if multiple elements panic
during drop? Even with a single allocated array what are consequences? Sadly,
drop_in_place docs says nothing about panics (except 1 mention in example).
So I've written a sample code to test it myself. Turns out after
first panic it attempts during unwinding drop the rest of elements. But if
another drop panics as well then this causes panic during panic situation we
discussed before and program terminates.
Another issue I haven't found any information if it sound to cast allocated
memory to slice and drop it in place. Also, I think, you can't use code above as UPDATE: as @alice and @Yandros figured out, you can have invalid values, but can't "produce" them, as described in the reference.
a language user since it exhibits undefined behavior: it is unsound to have a
reference to invalid value. Passing back to Dropper doesn't consume it. I'm
not even sure about passing front to drop_in_place. Reference here coerces
into fat pointer, but continues to live after the call until the end of function
block. Am I right or wrong?
Anyway, here is my attempt to apply gathered knowledge for solving
exercism task called "Circular Buffer" using unsafe means:
https://gist.github.com/DaTa-/9e701ff720dcb2839bcf41314b911e34 It won't surprise
me if it's implemented with undefined behavior since it's really hard to write
sound unsafe code with rust (it's not so easy even in C++).
If you found anything incorrect from my understanding displayed here - please,
correct me. Any links to good "unsafe" articles would be nice too. Thank you for
your time, sorry for bad English since I'm not a native speaker.