Is it possible to access a struct's private field outside the module?

Indeed, but I thought that's an orthogonal issue. Private fields aren't accessible without the newtype wrapper, either.

1 Like

Serialize/Deserialize came up because of this link posted above, explaining how they can be abused to access otherwise private fields in Rust types.

3 Likes

Apoligies all/both, if what I had shared was misleading..

1 Like

Thanks for all replies! I appreciate for all efforts to help!
I decide to duplicate the code snippets from std to my code, tedious but works :blush:

1 Like

You could transmute it to a struct with the same layout, or use pointer arithmetic. But that is very unsafe. At the very least it could break if the std implementation ever changed, and could potentially have undefined behavior.

1 Like

Unless the struct has some specific #[repr] on it, there's no struct with the same layout, even you copy-paste the std's declaration. It is explicitly stated that there are no guarantees of data layout made, so compiler can freely reorder its fields based on the declaration position, usage pattern, or even pseudo-random to prevent people to rely on unguaranteed implementation detail.

15 Likes

It could be a solution, in such case that I can transmute something first to gain inter-operation, and then contribute code to other crate.

Shocking! You mean the compiler will generate random fields order on purpose? Will the compiler generate different binary for the same code?

There is discussion about adding a flag to have it do this to help catch incorrect transmutes, but it wont be enabled by default.

2 Likes

Reproducible builds are also a goal, so a different binary for the same code with the same compiler version (and same flags and same PGO input and and and...) is probably a bug. (But there's still no guarantee that two repr(Rust) structs with the same fields have the same layout within the same compilation, say.)

1 Like

I don't see how reproducible builds are incompatible with randomized layout.
Just take source, call SHA512 on it and use that to seed random-number generator. Plus "salt" command-line option to make sure different builders would build different binaries.
And yes, it would be nice to have it enabled by default.
Not only this would make sure people wouldn't try dirty tricks, but, more importantly, that's quite nice security measure.
Linux kernel does that despite using C, thus it sound logical that Rust, being concerned about security have to do that, too.
Although I'm not sure how high this should prioritized. There are lots of other things which are more important.

5 Likes

Some ISAs encode instructions more compactly when field offsets are small (i.e., 0, 1..3F, etc). An optimizing compiler could make a static or PTO-based assessment of which fields are accessed more frequently and order them in the struct in a way that minimizes i-cache fetches. In such a case it is completely reasonable for the compiler to lay out differently two unrelated structs in the same program that just happen to have the same nominal alignment/size ordering of fields.

8 Likes

Hence my statement that it could be undefined behavior. You might be able to get it to work with a specific version of rust, but you can't rely on it.

2 Likes

Thank you for all these brilliant and professional replies!

I believe it explains why it is impossible to access a private field outside the module. (at least why should not)

Thanks again, and wish you all a nice day :revolving_hearts:

What about the layout with unsized slices? If I understand well, they are always placed at the end of the layout for technical reasons, so their position is inherently guaranteed. Then, to be able to do unsized coercion directly without copying, if the last field of a struct is an array, it should also be layed out at the end of the struct. In my experience, this works, and if it is not guaranteed, it would be beneficial to make the guarantee. To be e.g. able to point to an unsized struct TheStruct<[T]> using a slim pointer &TheStruct<[T;0]> when the size of the slice is known by some other means than the meta field of the fat pointer (and you can access the slice using get_unchecked). Also, it would be nice to be able to safely coerce from &TheStruct<[T;10]> to &TheStruct<[T;5]>.

I'd assume as little as possible: Given Struct<T: ?Sized>, if T is able to be coerced to some unsized U, it's at the end (so that &Struct<T> coerces to &Struct<U>). That doesn't mean any array goes at the end of any struct by any means.

(Probably T is always at the end but I wasn't able to find such a guarantee per se. It's a somewhat old feature.)

I believe that would be unsound as the reference wouldn't "cover" the rest of the array. Run this through Miri, for example.

Presumably that would follow the ability to coerce from &[T; 10] to &[T; 5], should it come to be. But a method seems more likely to me. (An argument against coercion for DSTs: maybe I keep tight control on creation of my custom DST and only allow the creation of Struct<[T; N]> when N is a power of 2 due to assumptions in the implementation; the coercion breaks that control.)

Of course the safety is not verifiable by the compiler when using get_unchecked from an array of unknown size. But it's still practical. I'm trying to create a fast SAT solver, as optimized as possible, so memory locality matters. So I allocate a contiguous memory layout for the heterogeneous input graph. Some classes of the graph nodes have variable number of adjacencies and the number is stored in the node. Pointers to the adjacencies are layed out locally within the node: i.e. no vectors are acceptable, only unsized slices.

The adjacency pointers are kind of my own fat pointers: each of them contains a slim *mut u8 pointer to the memory of the adjacency and the necessary methods (of type fn(*mut u8, and other parameters)). I want to be able to coerce the *mut u8 to a slim *mut SymbolicOr<[SymbolicOrAdjacency]>. I know that it is unsafe, anyway, there is no efficient way of dealing with such graphs safely. But such a pointer cannot be slim but even coercion to a fat pointer contains 0 in its size field, so the size field is superfluous. So yes, I coerce to *mut SymbolicOr<[SymbolicOrAdjacency; 0]> (the size is unknown at compile time, so 0 is the most reasonable option) and it works but there's no guarantee. And I access the fields via get_unchecked. Do you know about any way of coercing *mut u8 to a *mut SymbolicOr<[SymbolicOrAdjacency; N]> if N is not known at the compile time?

If you need to store things thinly, store raw pointers and use slice_from_raw_parts or slice::from_raw_parts instead of storing &[_; 0]. Granted, it sounds like you're already doing this. The distinction between &[T; 0] and *const T (or whatever) is important, so I want to emphasize for others if not yourself...

It goes beyond verifying safety -- the compiler can exploit what you have told it to compile your UB into anything. If it "notices" you're reading (or writing) beyond the scope you declared you could (by using a reference with the wrong length), it might well elide that code, poison the results of that function leading to far away compiled code changes, etc etc. (Ab-)Using UB is impractical in that you can't be sure you got the results you wanted without vetting the outputted assembler (say), every time you compile.

Something like this. You could also perhaps rely on less of the assumptions mentioned earlier with something like

#[repr(C)]
struct TheStruct<const N: usize> {
    foo: String,
    bar: [i32; N],
}

Those aren't what you asked for though; you would have to branch on every possible value of N because Rust is statically typed. So what you probably want instead is to be able to create a DST pointer dynamically (a *mut Symbolic<[SymbolicOrAdjacency]>). To do that with full certainty, you would need RFC 2580 to stabilize.

However, it seems likely you can get away with exploiting some unspecified* behavior like so. Via that UCG issue, perhaps slice_dst would be useful to you (CC @CAD97).

(*: Distinct from undefined behavior.)

1 Like

Ah, ok, thank you for your warnings, tricks and examples on how to follow the safe path. I'll probably go with the fat pointer casting and wait for RFC 2580. Maybe in the end I'll do some performance comparison with my slim pointer UB bad practice.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.