Why does the compiler not optimize away the indirection for something that can be generated in place?

Apologies if the title is confusing.

What I'm trying to do is something like:

pub struct BitField(pub usize);

impl std::ops::Index<usize> for BitField {
  type Output = bool;

  fn index(&self, index: usize) -> &Self::Output {
    if self.0 & (1 << index) > 0
  }
}

I understand why this doesn't compile: the output requires a reference and I would have to create some reference that can actually live past the end of the function body.

So I turned it into this:

pub struct BitField(pub usize);

impl std::ops::Index<usize> for BitField {
  type Output = bool;

  fn index(&self, index: usize) -> &Self::Output {
    if self.0 & (1 << index) > 0 {
      &true
    } else {
      &false
    }
  }
}

Building it in play.rust-lang.org with release optimizations turned on gives the following assembly:

<playground::BitField as core::ops::index::Index<usize>>::index:
	movq	(%rdi), %rax
	btq     %rsi, %rax
	leaq	.L__unnamed_1(%rip), %rcx
	leaq	.L__unnamed_2(%rip), %rax
	cmovaeq	%rcx, %rax
	retq

.L__unnamed_2:
	.byte	1

.L__unnamed_1:
	.zero	1

If I'm understanding this correctly, the compiler isn't generating the 0 or 1 that represents the boolean directly but instead reading it from some other memory location. Is this correct?

If I am correct, what I am then confused about is why is the indirection necessary in this case.
Since the return is &bool, which is immutable and can't self-reference, it should be valid for the compiler to eliminate the indirection entirely and simply generate the value in place rather than copy it from somewhere else

In short, I was expecting the assembly to be something that is close to a direct translation of this, without the indirection:

self.0 & (1 << index) > 0

Is there any reason why the compiler isn't allowed to do this or is it merely insufficient optimization?

Because Index::index returns a reference, the generated function assembly has to return a reference (memory address) to the caller. The caller could use the memory address in some fashion, and the program must be correct if you do so (e.g. the address of the boolean for two separate calls should be the same).

If you do the indexing from a scope with inlining visibility[1], then you will see in fact that it generates the bool directly rather than a reference to it.


  1. effectively, that is code within the same crate, or in any crate if the function is marked #[inline]. ↩ī¸Ž

2 Likes

Main reason why it generates code with references is lack of knowledge how code is actually called. If compilers knows how it is called, it can use values instead (usually it happens in arg-promotion pass). You may make job of compiler easier by adding inline hint. This would make it available across translation units for inlining.

Compiler do eliminate the indirection if possible.

But the signature of the Index::index do require the indirection(reference) so the compiler can't eliminate it on the function itself.

Thanks for the answers.

I didn't realize that hinting to the compiler to inline things would eliminate the indirection even across crate boundaries.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.