Fastest way to zero an array, followup

It's been a year now since I first asked this question below, and I'm inquiring has anything been done on introducing a fill or clear method in Rust that can do a fast bulk zeroing (setting mem elements to '0') of arrays/slices?

Sure looks like a fill methods exists now.

Edit: It seems to be slightly suboptimal for your use case in current stable rust, but that’s already fixed on nightly.

The codegen for .fill(0) is very different to .iter_mut().for_each(|m| *m = 0). Here is the .iter_mut().for_each(|m| *m = 0) assembly:

        test    rsi, rsi
        je      .LBB0_1
        mov     rdx, rsi
        xor     esi, esi
        jmp     qword ptr [rip + memset@GOTPCREL]
.LBB0_1:
        ret

And here is the fill(0) assembly:

        test    rsi, rsi
        je      .LBB0_4
        push    r14
        push    rbx
        push    rax
        mov     rbx, rsi
        mov     r14, rdi
        dec     rbx
        je      .LBB0_3
        mov     rdi, r14
        xor     esi, esi
        mov     rdx, rbx
        call    qword ptr [rip + memset@GOTPCREL]
.LBB0_3:
        mov     byte ptr [r14 + rbx], 0
        add     rsp, 8
        pop     rbx
        pop     r14
.LBB0_4:
        ret

They're both optimized. The fill(0) version includes an extra branch for when the slice length is 1, whereas the iterator version just calls memset directly.

However, on Nightly the situation is different. fill(0) compiles to:

        xor     edx, edx
        jmp     qword ptr [rip + _ZN76_$LT$$u5b$u8$u5d$$u20$as$u20$core..slice..specialize..SpecFill$LT$u8$GT$$GT$9spec_fill17h84e9f892edee5b96E@GOTPCREL]

That will invoke this specialized implementation, which I'm guessing will just become a memset. It's a shame that isn't inlined though - there's one extra function call :(.

1 Like

Someone should probably make a PR that adds #[inline] to those specialized implementations.

Well, let’s see if someone picks this up

3 Likes

Thanks for the quick response to my question.

I'm using Rust 1.50, and tried out fill in my code.

I did a quick test with the original code for two inputs that took ~13 mins to run, to see accumulative affects between versions. In one case using fill was about 6 secs slower, while the other case was no appreciable difference. I'll do more testing to assess any performance differences.

Since I'm using Rustup how do I use it to install nightlies if fill eventually gets the inlined PR done?

You can add the nightly toolchain rustup install nightly and then you can use it manually by adding the +nightly argument to cargo (or other rust-related commands). Eg cargo +nightly build or cargo +nightly run, etc, will use the nightly toolchain.

You can also set your default toolchain to nightly for a specific project by using rustup override. You navigate (cd) into the top directory of a project and call rustup override set nightly. Go back to default (usually stable) with rustup override unset in the same directory. This doesn’t actually modify any files in the project itself but instead globally stores that the path of the project is associated with the nightly toolchain; this will not automatically get updated when you move the project or removed when you delete it. You can list all those overrides with rustup override list, if you do this, it also tells you which command you would need to remove all overrides for deleted directories.

All of this and more can be explored further with the included help commands. E.g. rustup help gives you a general help, and things like rustup toolchain help or rustup override help give more specific help pages. rustup install seems to be a (not too well-documented) shortcut for rustup toolchain install.

1 Like

Thanks again. :slightly_smiling_face:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.