Thin Pointers, Fat Pointers, Alignment, Oh My! 🦁

So I'm a little bit un-savvy with lower-level manipulation of memory and pointers and such, but I just had an epiphany on what a fat pointer and a thin pointer was I think and I wanted to make sure I was right.

So a thin pointer is a pointer that is essentially a single usize that points to a value of a known type. So *const u8 is a pointer to a u8 that takes up one usize. *mut MyStruct is the same situation ( just mutable, which is an aspect of Rust's type system not an aspect of the hardware or the actual pointer data ).

A fat pointer is a pointer that takes up two usize's, one for the position in memory, and one for the length. So a slice is a fat pointer.

From stuff I've read elsewhere, I believe there are other kinds of fat pointers, so maybe a fat pointer is any pointer that takes up more that one usize?


Now I'm a little confused still about alignment. I found out here on the forum that pointers to different types need to / should be aligned, depending on the platform, but I'm a little confused at how that works because the type information doesn't actually exist for the pointer right? The pointer is just a usize as far as hardware is concerned.

Does it have to do with dereferencing pointers to primitives? Does the CPU care which primitive you are dereferencing a pointer to?

I've been making a little playground to mess with this stuff:

use std::ptr::slice_from_raw_parts;

// Us3 the C memory representation
#[repr(C)]
#[derive(Debug, PartialEq)]
struct MyType {
    // One byte
    a: u8,

    // One byte padding, because u16 has an alignment of 2 and it can't fit the
    // next u16 next to the previous u8

    // Two bytes. This necessarily makes `MyType` aligned to 2 bytes.
    b: u16,

    // An exta byte
    c: u8,
    // An extra padding byte because the type size must be rounded to its
    // alignment
} // That leaves the total size of this type at 6 bytes

fn main() {
    // Assert the type's size
    assert_eq!(std::mem::size_of::<MyType>(), 6);

    let var = MyType { a: 1, b: 2, c: 3 };

    // Aligned pointer to `var`
    let ptr = &var as *const MyType;

    // Thin pointer, one `usize`
    assert_eq!(
        std::mem::size_of::<*const MyType>(),
        std::mem::size_of::<usize>()
    );

    // I can create a slice pointing to the bytes in my struct from the pointer and the size of the
    // struct.
    let slice = slice_from_raw_parts(ptr as *const u8, std::mem::size_of::<MyType>());

    dbg!(unsafe { &*slice });

    // Fat pointer, two `usize`
    assert_eq!(
        std::mem::size_of::<*const [u8]>(),
        std::mem::size_of::<usize>() * 2,
    );

    // I can dereference this pointer to `MyType`
    assert_eq!(unsafe { &*ptr }, &MyType { a: 1, b: 2, c: 3 });

    // I can also dereference the fields, because, in this case because of `repr(C)`,
    // I know the memory layout.

    // Dereference the pointer to field `a`
    assert_eq!(unsafe { *(ptr as *const u8) }, 1);

    // Dereference the pointer to field `b`
    assert_eq!(
        unsafe {
            *((ptr as *const u8).add(2 /* account for padding */) as *const u16)
        },
        2
    );

    // Dereference the pointer to field `c`
    assert_eq!(unsafe { *((ptr as *const u8).add(4)) }, 3);

    // Now a pointer to any byte in my struct would be properly aligned, because a `u8`'s alignment
    // is 1 byte which fits anywhere. For instance, this pointer will point to the padding byte in
    // my struct ( which BTW appears to be able to be any random value )
    dbg!(unsafe { &*(ptr as *const u8).add(1) });

    // Now *this* pointer would be unaligned, because I'm trying to dereference to a `u16` which has
    // an alignment of 2. It *seems* to work, but I'm assuming this is undefined behavior.
    dbg!(unsafe { &*(ptr as *const u16).add(1) });

    // But how is that different than just getting the two bytes and then operating on them
    // individually? Are these primitive types actually special to the CPU, not just a concept
    // present in the Rust language?
}

All the way at the bottom I dereference what I think is an unaligned pointer, that's UB right? But what's the difference between, say, dereferencing those two bytes individually and manually operating on "logically" like a u16. Is it that the primitive types are special to the CPU? Are primitive types actually a concept in the CPU and not just an abstraction over the raw bytes provided by Rust?

Also, one more question, if I can create a type in memory with a certain alignment, and I have a pointer that I know points to that type, that pointer can't possibly be unaligned ( assuming it still points to the type I think it does ) because I can't create and unaligned type? Or probably not because I guess there's nothing saying I can't create an unaligned type. Yeah, I think that's wrong.

See What is a "fat pointer" in Rust? Note that the layout of each class of Rust-defined fat pointer is deliberately unspecified; the order of the constituent elements can change with each compiler release.

Alignment:

  • Each thin and fat pointer must be usize-aligned. It's possible to define fat pointers with greater alignment requirements.

  • The object pointed to by a thin or fat pointer must be aligned to the most stringent alighment requirement of any of its consituent elements. Note that objects may be declared with alignment requirements greater than the intrinsic alignment requirements of any of their constituent elements.

2 Likes

Pointers are not an usize. On compile time with *mut T we know what T is so we can know the size alignment and all the layout informations the T have. On compile time pointers can be converted from/into usize without loss of information, except some case like platforms with segmented memory, but the conversion itself is not guaranteed to be no-op

Check this blog post for more in-depth understanding what actually the pointer is. https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

1 Like