Hi, we're looking at a seemingly macos-specific bug affecting matrixmultiply, which involves the following code.
// set up buffer for masked (redirected output of) kernel
const KERNEL_MAX_SIZE: usize = 8 * 8 * 4;
const KERNEL_MAX_ALIGN: usize = 32;
#[repr(align(32))]
struct MaskBuffer {
buffer: [u8; KERNEL_MAX_SIZE],
}
// Use thread local if we can; this is faster even in the single threaded case because
// it is possible to skip zeroing out the array.
#[cfg(feature = "std")]
thread_local! {
static MASK_BUF: UnsafeCell<MaskBuffer> =
UnsafeCell::new(MaskBuffer { buffer: [0; KERNEL_MAX_SIZE] });
}
It would seem that on macos, buffer
doesn't get 32-byte aligned every time. Can this be a problem with TLS? Something else?
Macos + TLS has apparently been involved in other interesting issues before: https://github.com/rust-lang/rust/pull/51828/
If someone has more information it would be valuable.