Question on `librustc_symbol_mangling`

Hello :crab: ,

I've started reading the Rust compiler's source files, trying to better understand the internals of Rust.

I have a quick question regarding
a doc comment block from src/librustc_symbol_mangling/lib.rs .

Here is the doc comment block.

//! The main tool for avoiding naming conflicts is the incorporation of a 64-bit
//! hash value into every exported symbol name. Anything that makes a difference
//! to the symbol being named, but does not show up in the regular path needs to
//! be fed into this hash:
//!
//! - Different monomorphizations of the same item have the same path but differ
//!   in their concrete type parameters, so these parameters are part of the
//!   data being digested for the symbol hash.
//!
//! - Rust allows items to be defined in anonymous scopes, such as in
//!   `fn foo() { { fn bar() {} } { fn bar() {} } }`. Both `bar` functions have
//!   the path `foo::bar`, since the anonymous scopes do not contribute to the
//!   path of an item. The compiler already handles this case via so-called
//!   disambiguating `DefPaths` which use indices to distinguish items with the
//!   same name. The DefPaths of the functions above are thus `foo[0]::bar[0]`
//!   and `foo[0]::bar[1]`. In order to incorporate this disambiguation
//!   information into the symbol name too, these indices are fed into the
//!   symbol hash, so that the above two symbols would end up with different
//!   hash values.

My question :
It is mentioned here that rustc disambiguates DefPaths in anonymous scopes using indices.
Since these indices already disambiguates two fn bar() from each other, is it necessary to also feed them to a symbol hash just for the sake of generating unique names?? Or is there also another benefit that is earned by also feeding the already unique indices to a symbol hash??

Thank you very much for reading! :sun_with_face: :man_superhero:

During the linking, all the names must be unique. Linker doesn't know anything about the scopes, it just gets a flat list of symbols.

I assume that is why the compiler performs name mangling before linking, right?

Referring to the example explained in the doc comment block,

fn foo() { {fn bar() {} } { fn bar() {} } }

Since the compiler disambiguates DefPaths using indices,
the two fn bar()s are differentiated by the compiler as
foo[0]::bar[0] & foo[0]::bar[1].

My question is that, instead of feeding these indices to the symbol hash function, maybe these representations could be directly used as symbols to be fed to the linker
(without making another call to the hash function) ?

As I understand it, the key reason is that if the indices were included, then you'd have to design your mangling scheme to have some way to specify "index 0", "index 1", etc., and this would have to not conflict with any other symbols. Having a catch-all hash means that the scheme can be that much less complicated, and tooling consuming hashes don't have to handle every edge case.

Note that they mention the indices, but that's not the only thing that ends up in the hash. There are other random bits of information, like the crate version and source, which also get globbed in. I think the indices are more just an example of what kind of thing ends up in the hash.

For background on designing mangling schemes, I recommend reading the current symbol mangling RFC, and its corresponding RFC PR discussion, in which there's a lot of discussion of the pros & cons of different symbol mangling schemes.

1 Like

Thank you for the pointers! I'll take a look at them right away :crab:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.