Ptr::byte_offset alternative that doesn't generate load immediate instruction

Hello,

I'm working on some code that is incredibly sensitive to data fetch pipelining. For code abstraction reasons I need to be able to calculate an address without immediately fetching it.

I finally got to the bottom of my issue when I realized that ptr::byte_offset is generating a lea instruction. This is true of all all the safe destructuring methods as well.

I read @quinedot 's answer here Why core::ptr::offset is unsafe? and I thought maybe the alias checking was the problem. But ptr::wrapping_add() generates the lea too! (side note, how is that safe?)

Anyway, is there a way to perform ptr arithmetic (without just casting to a usize and back) and avoid generating a lea? Or is there a dimension to this problem that I am failing to consider?

Thanks for any thoughts.

The x86 lea instruction does not read memory. It runs the address calculation and produces the address that would have been accessed if an instruction that accessed memory had been used.

3 Likes

Thanks for explaining that. That answers the "How is this safe?" part of my question.

Annoyingly it is causing a pipeline stall / cache miss / something?? that is literally doubling the time taken to execute my program. (Not the time for this one function... The overall runtime!)

Maybe the lea is a red herring. I'll continue to investigate.

LEA is almost certainly a red herring. Compilers use it instead of ADD where possible because it doesn't set flags and modern chips have more address calculation units than full ALUs, so it's almost always a pure win to use LEA instead of a normal ADD.

If you really think you have bad slot usage in the assembly you're getting, you can try throwing it through llvm-mca - LLVM Machine Code Analyzer — LLVM 20.0.0git documentation to see whether it agrees.

6 Likes

Thank you for that link. This tool is awesome.

1 Like