Blog post: Sizedness in Rust

I wrote Sizedness in Rust because... well, to borrow the intro from the article itself:

Sizedness is lowkey one of the most important concepts to understand in Rust. It intersects a bunch of other language features in often subtle ways and only rears its ugly head in the form of "x doesn't have size known at compile time" error messages which every Rustacean is all too familiar with. In this article we'll explore all flavors of sizedness from sized types, to unsized types, to zero-sized types while examining their use-cases, benefits, pain points, and workarounds.

Please let me know if you find anything confusing, unclear, or inaccurate! Your feedback is very important and helps me improve the article. Thanks!

18 Likes

Nice writeup. Here are three areas that you may wish to revise:

a) One statement early in your writeup could use some elaboration, or at least an asterisk to indicate that it is not completely correct. The following statement does not take into account field alignment constraints, which may trigger inter-field padding and thus increase the size of the struct beyond the sum of the sizes of the struct's constituent elements.

b) The following rationale misses the critical implementation constraint that drives the specification:

The reason that the unsized field must be the last field is that it is the only location in the struct that permits the compiler to determine the starting offset of each field at compile time. That same rationale precludes having two unsized fields in the struct, as one of the two fields would not be "last".

c) This statement is probably inaccurate:

It is almost certain that the compiler optimizes the instances away competely, resulting in no generated code at all rather than any no-op instructions, which each would occupy at least 1 B of code space.

Again, nice writeup. :clap: :clap:

6 Likes

Saying that () has no size is probably also confusing. There's a difference between the size zero, and not having a size.

9 Likes

Thank you @TomP and @alice, I've updated the article taking into account your feedback!

Thanks for the article. It's good to spell all this out as it's a confusing area of Rust, in that what's going on behind the scenes is not that obvious.

If you're talking about difficulties of unsized types, maybe worth mentioning the problem with Option, e.g.: this doesn't compile struct A<T: ?Sized>(Option<T>); yet this does struct A<T: ?Sized>((bool, T));. There's no workaround in safe Rust that I'm aware of. In unsafe Rust I think you can use ManuallyDrop to implement your own Option.

1 Like

Edit: Please disregard the DST-friendly Option in the playground: it's unsound, and there is no safe way to do it right now. See here for an explanation

IGNORE To prove that it's possible to implement a DST-friendly Option, here's an implementation in the playground. IGNORE

Another tip for DST handling is working around the lack of stable CoerceUnsized. The trick is to get it into the form of a Box<something>, do the coerce into a Box<dyn something> or Box<unsized-something> and then turn that into a raw pointer to do what you want with, and to drop it turn it back from a raw pointer to a Box. That's handy for implementing your own Rc. See a minimal Rc implementation here. Probably I'm not describing this using all the right terms, but slowly I'm building an intuition about how to get the compiler to do what I want with DSTs, but it's so easy to get into a muddle about it all.

2 Likes

I try to target my posts toward Rust beginners working within safe stable Rust so while writing a custom Option DST in user-code is a very interesting exercise it's also a very advanced exercise that falls outside the scope of the article.

Okay, fine. I found your article interesting all the same. Reviewing the first principles is helpful. Unfortunately the most interesting uses of DSTs require unsafe at the moment. Those kinds of uses of DSTs can become very complex very quickly, and get into corners of the language that are not so well developed, so I guess you have to choose where to draw a line.

1 Like

I feel like I should mention my crate slice-dst here, which makes it possible (if not exactly trivial) to construct and use user-defined DSTs with slice tails.

2 Likes

You write that a slice is a "double-width pointer to a dynamically sized view into some array". This is not quite correct: A slice is a type [T], whereas &[T] is a reference to a slice. Slices aren't pointers. See this SO question and the reference for more details.

Personally I agree with you that it should be that way, but that's not how TRPL uses the term:

A string slice is a reference to part of a String

This slice has the type &[i32] .

I usually try to say "bare slice" when talking about stuff like [T] and str, and "slice reference" / "boxed slice" / etc. otherwise, to avoid confusion. But there is plenty of precedent for using "slice" to mean a reference.

1 Like

Paraphrasing the rust language reference linked above.

[T] a slice
&[T] a shared slice, often just called a 'slice'
&mut [T] a 'mutable slice'
Box<[T]> a 'boxed slice'

2 Likes

My view is that, because Rust slices exist only as references, [T] is really a conceptual extent (subset) of an array object. One cannot create or manipulate a bare [T], but only describe/delimit it via some form of slice reference such as &[T], &mut [T], or box<[T]>.

4 Likes

Isn't that fact essentially a specific instance of the greater conceptual limitation of not being able to directly manipulate any DST value?

For me it's not that one "cannot directly manipulate a DST value", but more fundamentally that one cannot describe a DST value without a fat pointer that specifies both starting address and extent. Thus any [T] must be derived from such a DST by adjusting the starting address or extent (or both) of such a fat pointer to delimit a subset of the allocation from which it is derived.

1 Like

I really enjoyed the article! Thanks!

The best part for me was when you said "To really hammer the point home..." considering your username :innocent:

1 Like

This bit caught me off-guard:

`assert_eq!(DOUBLE_WIDTH, size_of::<Box<dyn ToString>>()); // trait object`

Why are Boxes double-width pointers? Intuitively, I would think they would be single-width. Digging into the Rust source, it looks like they should be single-width:

pub struct Box<T: ?Sized>(Unique<T>);
pub struct Unique<T: ?Sized> {
    pointer: *const T,
    _marker: PhantomData<T>,
}
pub struct PhantomData<T: ?Sized>;

Nonetheless, I can confirm that this appears to be correct -- the assertion holds when I compile the above.

It’s the *const dyn ToString that’s double width here, because it has to contain both a pointer to the data and a pointer to the vtable of trait methods that operate on that data.

2 Likes

Ah, right! My origins are from C++, where the vtbl pointer(s) are carried in the struct layout itself. This is, of course, a fundamental difference between C++ and Rust -- C++ imbues structs with dynamic properties (virtual methods, etc.) if you "bless" them with a virtual method; Rust keeps the struct and its dynamic behaviors nicely distinct.

Thanks Euler (aka 2e71828)!

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.