Addresses of ZSTs

  • Are ZSTs guaranteed to have a non-zero address?

  • Are they guaranteed to have the address of the field before them if they are a field in a struct that is ‘repr(C)’?

  • Without setting an explicit struct ABI, is their address at least guaranteed to be inside the bounds of a struct containing the ZST as a field?

  • What does it mean for a ZST reference to be “suitably aligned”? Unless they just mean the reference itself still has an alignment requirement, not the ZST object?

Yes.

No, I think you haven’t quite got that one right. The field after the ZST-typed field may be guaranteed to have the same address. I think you’d need to take alignment into consideration though, there could be padding bytes between the ZST and the subsequent field. You’ll need to calculate whether there’s a need for padding bytes, taking into consideration the alignment of the ZST and the alignment of the type of the subsequent field, as well as the layout of all the fields before the ZST. If there aren’t padding bytes, then the address of the ZST should be guaranteed to be the same as the subsequent field.

I think the answer is probably yes here. After all, the memoffset::offset_of macro is a part of a popular crate that seems to be considered sound AFAICT; crates like that seem to indicate that addresses of struct fields are in some ways consistent and sensible even for non repr(C) structs, for all types of fields (including ZSTs). So at least my personal feeling is that if a ZST field’s address really could lie outside of the containing struct, that would be too misbehaved for crates like memoffset to reasonably exist.

Yeah, they mean that it’s the same as with non-ZST objects. A good example would be the type [T; 0] which is a ZST but has the same alignment requirements as T has.

Thanks! That sounds mostly like I would intuitively expect things to work. You're right about mixing up before/after.

I think your answer still has the ambiguity I'm trying to get at. Are you saying that if I have a u8 followed by a [u64; 0] followed by another u8 that there are going to be seven padding bytes in-between, just so that if I take the address of the array I get a u64 aligned address? Because that seems really weird. My other interpretation is it's just saying that align_of<*const MyZST>() == align_of<*const u64>() (a restriction on the alignment of the address of the pointer itself rather than what it points to).

yeah

exactly

Well, it is what it is.

No, I’m not talking about the pointers.

In a #[repr(C)] struct only of course, otherwise the compiler could re-order the fields to be smart and need less padding.

#[repr(C)]
struct S {
    a: u8,
    b: [u64; 0],
    c: u8,
}

use memoffset::offset_of;
use std::mem;
fn main() {
    dbg!(offset_of!(S, a));
    dbg!(offset_of!(S, b));
    dbg!(offset_of!(S, c));
    dbg!(mem::size_of::<S>());
    dbg!(mem::align_of::<S>());
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.64s
     Running `target/debug/playground`
[src/main.rs:11] offset_of!(S, a) = 0
[src/main.rs:12] offset_of!(S, b) = 8
[src/main.rs:13] offset_of!(S, c) = 8
[src/main.rs:14] mem::size_of::<S>() = 16
[src/main.rs:15] mem::align_of::<S>() = 8

(playground)

1 Like

Note that C does not allow ZSTs at all, so putting one in a repr(C) struct is at least highly questionable, if not actually wrong.

3 Likes

I don’t think so. The use of repr(C) is not uncommon for getting predictable struct layout for use within Rust only, too. Interoperability with C is by no means its only use-case. AFAIK, repr(C) has predictable and well-defined behaviors for ZST-fields as-well, even when those aren’t a thing in the C language itself.

1 Like

Here’s a quote from the relevant section in the Reference

The C Representation

The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

Because of this dual purpose, it is possible to create types that are not useful for interfacing with the C programming language.

3 Likes

Sure. I stand by "highly questionable" though. Most people should not need to resort to such shenanigans unless they are doing FFI.

1 Like

I don't see how "doing FFI" is any sort of argument against using ZSTs. FFI isn't C. FFI is binary compatibility and nothing more. Hell, FFI can even be between two things written in rust! (I have done this and used repr(C) structs to store data in shared memory!)

Prior to #[repr(align)], zero-sized arrays were the best way to force extra alignment on something, permitting a representation of C's _Alignas. A ZST at the end can be convenient for representing dynamically-sized arrays. PhantomData can be used to connect generic type arguments between the arguments and return type of an FFI call.

4 Likes

Backing up a bit because apparently multiple people have mistaken my intent, which means I communicated poorly.

Since @jgarvin didn't tell us what they're intending to do, I thought it would be appropriate to point out that, for what, to me, seems like the most common application of repr(C) (that is, for FFI with C), using a ZST is probably a mistake because C itself does not support ZSTs.

Yes, there are other uses of repr(C). But if you're doing Rust-to-Rust FFI (which has many more concerns besides field reordering), or designing your own Rc for whatever reason, I figure you probably already know that. If you just want to know how repr(C) works, my default assumption is that you want to do C FFI. And the most likely answer to that is "C doesn't have ZSTs".

This is not my experience. ZST inside repr(C) is ubiquitous in properly done C FFI. For example a binding to something like folly::Range (folly/Range.h at master · facebook/folly · GitHub) looks like:

use std::marker::PhantomData;
use std::os::raw::{c_char, c_uchar};

#[repr(C)]
#[derive(Copy, Clone)]
pub struct Range<'a, Iter: RangePtr> {
    b_: Iter,
    e_: Iter,
    contents: PhantomData<&'a [Iter::ValueType]>,
}

pub trait RangePtr: Sealed { type ValueType; }
impl<T> RangePtr for *const T { type ValueType = T; }
impl<T> RangePtr for *mut T { type ValueType = T; }

unsafe impl<'a, I: RangePtr> Send for Range<'a, I> where I::ValueType: Sync {}
unsafe impl<'a, I: RangePtr> Sync for Range<'a, I> where I::ValueType: Sync {}

pub type StringPiece<'a> = Range<'a, *const c_char>;
pub type MutableStringPiece<'a> = Range<'a, *mut c_char>;
pub type ByteRange<'a> = Range<'a, *const c_uchar>;
pub type MutableByteRange<'a> = Range<'a, *mut c_uchar>;
3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.