Disable struct reordering optimization when emitting llvm bitcode

Hi,

I used rustc emit llvm ir. However, the struct type is reordered. After digging into some online posts, it seems that there was an optimization step between MIR and LLVM IR that optimizes the layout of the struct type, and rustc has the flag to tune the optimization level.

My question is which optimization specifically that is responsible for this optimization? According to this post: Optimizing Rust Struct Size: A 6-month Compiler Development Project | Blindly Coding
I should be able to use
rustc source_program.rs --emit=llvm-ir -Z fuel="crate"=0
to turn off the optimization, so that the field in the struct type in the llvm ir is the same as the source code.

If this is not the right way to achieve the alignment of field between llvm ir and the source, are there any other ways?

1 Like

If you just want to disable field reordering for a couple types, slapping a #[repr(C)] on their declarations will force those optimisation steps to lay structs out in the same order as your definition.

3 Likes

Thanks for your suggestion. I was aware of this option. However, I am interested in building an llvm pass to analyze any rust programs, so I am not the author of programs under test and prefer not to modify the source code.

3 Likes

Sure. You could require that debug information be on, and look at the mappings in the debug info from the source-level fields to what you find in the LLVM-IR.

Notably, even disabling ordering probably isn't enough, because I suspect that zero-length fields might not be included sometimes -- they certainly aren't for repr(transparent), at least: https://rust.godbolt.org/z/TEj44xosT

2 Likes

You could require that debug information be on, and look at the mappings in the debug info from the source-level fields to what you find in the LLVM-IR.

Thanks! I tried this before. The issue is that this still requires some human inspection between each subtype name, and the declaration of the struct in the source code. I would like to dump some information about the struct as in the source-level order. It would be nice if the llvm ir is just the same as the source code in terms of struct field order.

Uh, no? It's right there in the metadata:

!12 = !DICompositeType(tag: DW_TAG_structure_type, name: "Foo", scope: !7, file: !13, size: 8, align: 8, elements: !14, templateParams: !18, identifier: "e6ab5044d36405a9893c020ecc272e62")
!14 = !{!15, !17}
!15 = !DIDerivedType(tag: DW_TAG_member, name: "__0", scope: !12, file: !13, baseType: !16, align: 8)
!16 = !DIBasicType(name: "()", encoding: DW_ATE_unsigned)
!17 = !DIDerivedType(tag: DW_TAG_member, name: "__1", scope: !12, file: !13, baseType: !10, size: 8, align: 8)
!10 = !DIBasicType(name: "u8", size: 8, encoding: DW_ATE_unsigned)

There's the two fields (elements: !14), even though the LLVM type is just i8.

2 Likes

Yeah, you are right. I double checked one of programs I tested, and the debugger info showed that the order in the type elements are of the same order of the source code. I must have misread the IR.

I remember what my problem was with the debug info now. Because the order in the debug is the same with the source code, while in the actual IR, when I use GetElementPtrInst to access the index, I cannot retrieve exactly what this field is in the struct type. The debug info is misaligned with the IR.

Can I find a mapping between the index of the struct type in IR and the debug info?

You need to look at the offset in the DIDerivedType, then.

For example, https://rust.godbolt.org/z/jYhcWPq9r has

pub struct Foo { a: u8, b: u16, c: u8 }

where the metadata is

!13 = !DICompositeType(tag: DW_TAG_structure_type, name: "Foo", scope: !7, file: !14, size: 32, align: 16, elements: !15, templateParams: !20, identifier: "f4f4b37c91854cd87dc81ce3bdfefa32")
!15 = !{!16, !17, !19}
!16 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !13, file: !14, baseType: !11, size: 8, align: 8, offset: 16)
!17 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !13, file: !14, baseType: !18, size: 16, align: 16)
!19 = !DIDerivedType(tag: DW_TAG_member, name: "c", scope: !13, file: !14, baseType: !11, size: 8, align: 8, offset: 24)

where you can see from the offsets that the order of the fields in the llvm type is b then a then c.

And thus the

%0 = getelementptr inbounds %Foo, %Foo* %x, i64 0, i32 1, !dbg !24
                                                // ^^^^^

is looking at a.

3 Likes

Thanks! This is enough to solve my problem.