Clippy refuses to compile what builds and runs

Thanks to forum member raidwas I now have my effort at transposing a large 2D matrix in a cache friendly manner working in Rust as fast as it does in C. My trials and tribulations with this a few months back are in this thread: Clippy driving me to insanity, insisting on iterators - #42 by ZiCog

Unfortunately clippy now errors out and fails to compile in two places. Despite the fact there is no error or warning when I build this and it appears to run just fine.

The problem area is:

    let mut a = unsafe {
        let layout = std::alloc::Layout::new::<[[i32; MAX]; MAX]>();
        let ptr = std::alloc::alloc_zeroed(layout) as *mut [[i32; MAX]; MAX];
        Box::from_raw(ptr)
    };
    let mut b = unsafe {
        let layout = std::alloc::Layout::new::<[[i32; MAX]; MAX]>();
        let ptr = std::alloc::alloc_zeroed(layout) as *mut [[i32; MAX]; MAX];
        Box::from_raw(ptr)
    };

Which produces the clippy errors:

error: casting from `*mut u8` to a more-strictly-aligned pointer (`*mut [[i32; 8192]; 8192]`) (1 < 4 bytes)
   --> src/main.rs:164:19
    |
164 |         let ptr = std::alloc::alloc_zeroed(layout) as *mut [[i32; MAX]; MAX];
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[deny(clippy::cast_ptr_alignment)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#cast_ptr_alignment

error: casting from `*mut u8` to a more-strictly-aligned pointer (`*mut [[i32; 8192]; 8192]`) (1 < 4 bytes)
   --> src/main.rs:169:19
    |
169 |         let ptr = std::alloc::alloc_zeroed(layout) as *mut [[i32; MAX]; MAX];
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#cast_ptr_alignment

Misaligned pointers sounds rather catastrophic so I'm wondering how it is cargo build does not complain about it?

The complete code is here if anyone is curious. It's not very big.

What I'd also like to know is how to create those arrays without "unsafe".

1 Like

Box::new([[0; MAX]; MAX). This will probably blow the stack (thus your use of alloc directly). The copyless crate allows you to instead write Box::alloc().init([[0; MAX]; MAX]), which reorders the box allocation and the array creation, allowing it to optimize to direct on-the-heap creation (when optimizing, not in debug mode without optimizations, you'll probably need -O1 to let this work in debug mode).

The clippy error is just a conservative false positive. The pointer is aligned enough (because you allocated it with said alignment), but the type system does not guarantee it. This is when you allow the clippy lint to say "yes, the compiler isn't helping me here, but I verified that it's correct."

1 Like

To add to this question, I also tried using a crate (I forgot the Name) that splits the box creation into allocation and writing its content in order to get rid of the memcpy and creating the content on the stack.
Sadly this approach didn't work as I still got an stack overflow. If anyone knows why this didn't work (as far as I understood that was exactly the use case of the library).

A right, I tried using copyless, but it didn't work.
I used the compile option from ZiCog's got repository, which includes release and native cpu.
Any idea why?

It cannot optimize out the stack usage without optimizations. Use --release or change [profile.dev] to include enough optimizations that it compiles properly.

Make sure you're using --release and whatever other flags you use to build/run with clippy as well, or it will still default to the debug (no optimization) profile.

As I wrote, the compile options included release and native cpu so I really don't know what more to add as optimization parameters.
Edit:
The command line used is $ RUSTFLAGS="-C opt-level=3 -C debuginfo=0 -C target-cpu=native" cargo run --release

Cargo.toml (and env) does not contain any other flags.

If copyless is not eliding the copy with --release (remember: the flag must always be used every time you invoke cargo), that's a missed optimization and copyless would probably like a minimized example to track it.

Unfortunately, copyless cannot guarantee the optimization, just make it possible and easier to happen. If you've run into a scenario where it doesn't work, you'll unfortunately have to fall back to using Box::new_[uninit|zeroed] or alloc[_zerored] and initializing the place yourself.

A full runnable self contained example would be nice to demonstrate the missed optimization causing the stack overflow.

Yeah, it's just because alloc_zeroed returns *mut u8, and u8 has a smaller alignment than [[i32; MAX]; MAX]. Clippy doesn't understand that your Layout argument makes alloc_zeroed actually return a pointer with proper alignment.

I will look into that :slight_smile:

On a different note, I also found the crate dedefault-boxed which claims to be able to do it without relying on compiler optimizations, but I haven't tried using it.

Actually, despite the README instructions I have there I have been building this with just "cargo build --release" and "cargo run --release". That long winded command line with options was just a failed attempt at getting Rust to do this job at the speed of C.

I'm not sure I understand all the comments here yet. For example:

Why would clippy need "--release"? I only want clippy to check my source not build the thing. Does it?

I find it amazing that it is so hard to create these simple arrays without "unsafe" or some external library.

This still does not match C speeds unless using versions of the function that require "unsafe".

On my machine the speed of the function that was implemented just like the c version (the one with four nested loops) was a bit faster than the c version?

Most of the comments regarded my problem with the use of copyless in order to get rid of the unsafe part

On my PC I'm getting, for C compiled with GCC:

$ ./loop_blocking
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    821ms
do_it_1:    161ms

And for the various Rust functions:

$ ./target/release/loop_blocking
Rust functions:
MAX:        8192,
BLOCK_SIZE: 64,
allocated
transpose_0:    810ms
transpose_1:    810ms
transpose_2:    161ms
transpose_3:    234ms
transpose_4:    169ms
transpose_5:    160ms
transpose_6:    164ms
C functions (via rustc):
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    814ms
do_it_1:    165ms

Once again Rust beats C by a tiny margin!

But note:

That transpose_2 function is totally not idiomatic Rust and clippy complains about the for loops over the arrays there.

The transpose_5 function matches the performance but uses "unsafe"

There is no way to do this in a clippy friendly totally "safe" way. Unless someone knows better.

That array initialization problem still bugs me.

The issue there is that you are:

  1. Trying to build a huge array. An array whose creation on the stack would crash your program.
  2. Required to build it directly on the heap, by allocating memory manually. This obviously can go catastrophically wrong if done incorrectly, hence you need unsafe. Rust never promises you can build all programs without some use of unsafe. stdlib is full of it.
  3. Incidentally doing something not so common. Clippy isn't outfitted to deal with absolutely every scenario. If it were me, I'd slap on an ignore label on that block, since I can personally guarantee that it's okay.
1 Like

To be honest, I would argue that clippy is more like pointing out that your code might not be idiomatic, not that it isn't nor that their is a better way to write it.
I have absolutely zero problems understanding transpose_2, but understanding some of the other implementations using iterators is much more complicated (at least for me).

Second: I would guess that you could remove the unsafe parts of transpose_5 and transpose_6 as their bound checks should be removed by now from the compiler (as they are for transpose_2).
But guessing won't bring anything to the table so I'm going to test it :slight_smile:
Edit:
After removing the unsafe part from transpose_6 and replacing them with normal indexing operations the performance stays the same. Note that this also applies to transpose_5 as it is basically the same, just with an additional new type pattern.

  1. It's a big array. I would not call it huge in this modern world. Certainly I don't want it on the stack though.

  2. I can understand the need for "unsafe" here. On the other hand I don't. It's just a simple array, just give me one, initialized to zeros, without having to jump through tortuous hoops.

  3. Yep, an ignore label works fine for most clippy warnings. Actually the whole original thread was about that.

In this case clippy is not just issuing a warning about style, it's throwing a big red error and aborting. Which is rather disturbing.

This is all rather icky. I now have to silence clippy about an alignment issue. And use "unsafe". And I have no idea what "std::alloc::Layout::new" and "std::alloc::alloc_zeroed" do yet.

1 Like

Maybe the allocator api could be expanded with a function that has a generic type and doesn't take a layout? In many cases this would be sufficient, but would still require the use of unsafe to fill the contents of the allocation (but atleast would prevent silly pointer casting errors).

On second thought, I guess this would be a perfect use case for a small library.

Box::new_uninit and Box::new_zeroed are probably what you're looking for here.

3 Likes

std::alloc::Layout contains two fields (Quoting from the source):

pub struct Layout {
    // size of the requested block of memory, measured in bytes.
    size_: usize,

    // alignment of the requested block of memory, measured in bytes.
    // we ensure that this is always a power-of-two, because API's
    // like `posix_memalign` require it and it is a reasonable
    // constraint to impose on Layout constructors.
    //
    // (However, we do not analogously require `align >= sizeof(void*)`,
    //  even though that is *also* a requirement of `posix_memalign`.)
    align_: NonZeroUsize,
}

And new simply constructs a Layout from the given parameters.

std::alloc::alloc_zeroed does exactly as the name implies: "Allocate zero-initialized memory with the global allocator". I don't have an implementation to show you though since it's up to the current global allocator, and you can always assume that it'll be zeroed, since if the allocator breaks that contract, it'd be invoking UB (And therefore it would not be your fault, instead it would be the fault of the allocator).

I believe it's throwing an error your way since doing such an operation incorrectly is definitely UB. However, that's the point of unsafe; it's code which the compiler doesn't make guarantees on, so it's your job to make sure those guarantees are upheld. ...and they are.

As mentioned by @CAD97, those Box functions should do what you're looking for without having to see all the unsafe in your own code.

I get the idea of "unsafe".

However I don't see why I should have to be worrying my head about "unsafe" just to create a simple array.

For example, I can write:

    let mut z;
    ...
    z = 2;

Following that pattern I should be able to write:

    let mut z: Box::<[[i32; MAX]; MAX]>;

    for i in 0..MAX {
        for j in 0..MAX {
            z[i][j] = 0;
        }
    }

All should be well unless I try to use it before I initialize it. No unsafe required.

Thanks, those Box functions work fine:

    let a = Box::<[[i32; MAX]; MAX]>::new_zeroed();
    let mut a = unsafe { a.assume_init() };

    let b = Box::<[[i32; MAX]; MAX]>::new_zeroed();
    let mut b = unsafe { b.assume_init() };

Shame about the "unsafe" but much shorter and sweeter.