How to safely write a String to a union field

According to the reference, "writes to union fields are safe, since they just overwrite arbitrary data, but cannot cause undefined behavior". However, I can't get the following to work without unsafe:

use std::mem::ManuallyDrop;

struct Data {
	first: String,
	second: u8,
}

union U {
	data: ManuallyDrop<Data>,
	num: u64,
}

fn main() {
	let mut a = U { num: 3 };

	a.data.first = String::new(); // error
}

Is there any way to solve this?

you forgot ManuallyDrop

a.s = ManuallyDrop::new(String::new());
1 Like

@nerditation sorry, that code was a simplified example and it turned out to be fairly trivial. I'm actually interested in cases where there's an extra layer of nesting like this:

use std::mem::ManuallyDrop;

struct Data {
	first: String,
	second: u8,
}

union U {
	data: ManuallyDrop<Data>,
	num: u64,
}

fn main() {
	let mut a = U { num: 3 };

	a.data.first = ManuallyDrop::new(String::new()); // error
}

EDIT: I'm just going to replace my original post with this to hopefully avoid confusion...

TLDR: this code is not a "write", but a "read", so it needs unsafe

note, here, field is referring to the union, not the struct. here's an equivalent pice of code, which hopefully demonstrate why it is a "read", not a "write", and why it is unsafe:

struct Data {
    first: String,
    second: u8,
}
impl Data {
    fn set_first(&mut self, s: String) {
        self.first = s;
    }
}
union U {
    data: ManuallyDrop<Data>,
    num: u64,
}

let mut a: U = todo!();
// safety: `a` must be the `data` variant,  not the `num` variant.
let data: &mut Data = unsafe { &mut a.data };
data.set_first(String::new());

EDIT:

so what a real "write" operation looks like?

let mut a: U = todo!();
a.data = ManuallyDrop::new(Data { first: String::new(), second: 42 });

as you can see, this is totally safe.

3 Likes

Okay, I think I understand the problem. Writing a.data = ... is a special case that's allowed but doing absolutely anything else with a.data requires us to unsafely assert that a.data is a valid instance of its type. But the limitation is that (I think) there's no way to express in Rust the concept of a write-only &mut.

it's not really about reading or writing the "memory".

the term "read" for unions should not be interpreted in the sense of "reading from the memory", but in a more abstract sense, roughtly "to assert" or "to observe" the runtime type of the union's variant (or "field"). the unsafety really comes from the fact the type information is unknown to the compiler.

back to the example, even you didn't really "read" the memory of the struct, but if its runtime type is incorrect, the "write" operation is using a nonsense address, because the address is derived as if it were the first field of the struct Data, while its type is not Data.

please use an enum and avoid unions, unless you absolutely need to control the layout of every bytes in your data type and it cannot be done in a safe way, such as some ffi data types.

No, it's not a nonsense address, it's guaranteed to be in bounds and well-aligned and the same as what you might calculate with offset_of!()...

From thinking about it more I believe it should always be sound to write a String (or anything) to the union if you do it via MaybeUninit. Although again I think there's no way to express that in the language.

Well, enums are quite different since they don't support type punning.

from the type system's perspective, it is not a valid safe pointer, although the address may be within "bounds" and well-aligned.

even if assuming the address is good, you should only use raw pointers to avoid UB, and that's why offset_of!() and/or addr_of!() exsit. [1], and you should NOT access it via the "normal" field access syntax, which is a place expression, to quote the reference:

Implicit borrows may be taken in the following expressions:

  • ...
  • Left operand in field expressions.
  • ...

I don't know what you mean exactly, but MaybeUninit can eliminate uninitialized memory related UB when used correctly (well, it more or less forces you to use raw pointers all over the place).

if you mean something like

union U {
    s: MaybeUninit<String>,
}
// or even
union U {
    s: MaybeUninit<ManuallyDrop<String>>,
}

then you are right, rust does not support this, rust requires all the variants of a union to be either Copy, or ManuallyDrop, or references.

well, if you are doing type punning, then you are essentially cheating the type checker, so you cannot get away without unsafe. all the type punning methods in rust need unsafe, in one way or another. e.g. transmute() is an unsafe function, and raw pointers also need unsafe when you access the pointee, ffi functions are always unsafe, etc.

just a reminder, type punning for types that are neither #[repr(C)] nor #[repr(transparent)] (and not primitive scalar types, obviously) is too easy to be unsound. remember, structs (and unions!!!) don't have a specified memory layout unless annotated with #[repr(...)] attribute. it's really easy to (accidentally) access padding bytes in types.


  1. the raw reference operator (&raw const|mut expr) is now stable so its macro form addr_of!() is not necessary any more, but offset_of!() is still a compiler builtin, and cannot be implemented soundly in the library ↩ī¸Ž