Why the data members in the .data..L_MergedGlobals section not separately segmented

By default, all variables and functions in the obj generated by rust are segmented separately. However, the .data..L_MergedGlobals section looks special and the variables in the obj are not segmented separately.

Take the following code as an example, I've defined five global variables, three of which are static.

static mut G_RUST_FUNCTION_SAME_VALUE1: i32 = 1;
static mut G_RUST_FUNCTION_SAME_VALUE2: i32 = 2;
static mut G_RUST_FUNCTION_SAME_VALUE3: i32 = 3;

pub static mut G_RUST_FUNCTION_SAME_VALUE4: i32 = 3;
pub static mut G_RUST_FUNCTION_SAME_VALUE5: i32 = 3;

pub fn rust_function_base035() -> i32 {
    println!("\t[bin] rust_function_base035");
    unsafe {
        G_RUST_FUNCTION_SAME_VALUE1 += 1;
        G_RUST_FUNCTION_SAME_VALUE2 += 2;
        G_RUST_FUNCTION_SAME_VALUE3 += 3;
        G_RUST_FUNCTION_SAME_VALUE4 += 4;
        G_RUST_FUNCTION_SAME_VALUE5 += 5;

        G_RUST_FUNCTION_SAME_VALUE1 + G_RUST_FUNCTION_SAME_VALUE3 + G_RUST_FUNCTION_SAME_VALUE5
    }
}

I use the compilation option "--emit=obj", to output the object file.

When I compile with debug, cargo build, everything is ok, The five data members in the object file are separated into separate segments.

readelf -SW bin_obj/binfunc.o
.....

[14] .data._RNvCs1epMNu3iPaj_7binfunc27G_RUST_FUNCTION_SAME_VALUE1 PROGBITS 0000000000000000 0003a8 000004 00 WA 0 0 4
[15] .data._RNvCs1epMNu3iPaj_7binfunc27G_RUST_FUNCTION_SAME_VALUE2 PROGBITS 0000000000000000 0003ac 000004 00 WA 0 0 4
[16] .data._RNvCs1epMNu3iPaj_7binfunc27G_RUST_FUNCTION_SAME_VALUE3 PROGBITS 0000000000000000 0003b0 000004 00 WA 0 0 4
[17] .data._RNvCs1epMNu3iPaj_7binfunc27G_RUST_FUNCTION_SAME_VALUE4 PROGBITS 0000000000000000 0003b4 000004 00 WA 0 0 4
[18] .data._RNvCs1epMNu3iPaj_7binfunc27G_RUST_FUNCTION_SAME_VALUE5 PROGBITS 0000000000000000 0003b8 000004 00 WA 0 0 4
....

When I compile with release, cargo build --release, Three static global variables are optimized into a segment named .data..L_MergedGlobals. The remaining two non-static global variables are still separate segments.

readelf -SW bin_obj/binfunc.o
...
[ 5] .data._RNvCsaf4kqbwsgqU_7binfunc27G_RUST_FUNCTION_SAME_VALUE4 PROGBITS 0000000000000000 0000d0 000004 00 WA 0 0 4
[ 6] .data._RNvCsaf4kqbwsgqU_7binfunc27G_RUST_FUNCTION_SAME_VALUE5 PROGBITS 0000000000000000 0000d4 000004 00 WA 0 0 4
...
[11] .data..L_MergedGlobals PROGBITS 0000000000000000 000108 00000c 00 WA 0 0 4
...

As you can see from the symbol table, the three static global variables are not only optimized to the same segment, but also the name page changes, with.0 after the name.

readelf -sW bin_obj/binfunc.o
13: 0000000000000000 4 OBJECT LOCAL DEFAULT 11 _RNvCsaf4kqbwsgqU_7binfunc27G_RUST_FUNCTION_SAME_VALUE1.0
14: 0000000000000004 4 OBJECT LOCAL DEFAULT 11 _RNvCsaf4kqbwsgqU_7binfunc27G_RUST_FUNCTION_SAME_VALUE2.0
15: 0000000000000008 4 OBJECT LOCAL DEFAULT 11 _RNvCsaf4kqbwsgqU_7binfunc27G_RUST_FUNCTION_SAME_VALUE3.0

The appearance of the data..L_MergedGlobal section may be an optimization of llvm. Currently, similar information is only a simple description in the llvm document.

Global variables can be marked with unnamed_addr which indicates that the address is not significant, only the content. Constants marked like this can be merged with other constants if they have the same initializer. Note that a constant with significant address can be merged with a unnamed_addr constant, the result being a constant whose address is significant.

Is there any option to turn off these optimizations? Under release, static global variables are not optimized to the data..L_MergedGlobal section, and symbol names are not added with strange. 0?

This optimization pass seems to be responsible:

According to the description it only merges internal (not exported from the codegen unit) global variables together. And by default only global variables that are always used together. There shouldn't be a reason to disable it, but if you need to anyway, it seems like -Cllvm-args=-enable-global-merge=0 would work. There is no guarantee that this will remain working in the future though. It will cause an error if LLVM ever removes this option.

The .0 is not all that strange. If you declare a symbol foo in LLVM ir and then declare another symbol with the same name it will be called foo.0 and the next one foo.1 and so on.

1 Like

Thank you for your reply, it's been a great help to me, And I'd like to ask you a question.
I used Rust's V0 mangle name rule 2603-rust-symbol-name-mangling-v0 - The Rust RFC Book, and I saw that Rust's mangle name contained various namespace names and even a strange string of hashes means disambiguator for "mycrate". Does that mean that the symbolic names in Rust don't have duplicate names?

  1. Is it solved?
  2. I read the manual has such a definition: By default clang merges globals with internal linkage into one:
    MergedGlobals.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.