More slice permutations than swap

For some application I need a permutation of three positions of a slice. Currently, I do this as follows:

fn permute3a<E>(lst: &mut [E], a: usize, b: usize, c: usize) {
    lst.swap(b, c);
    lst.swap(a, b);
}

This moves the element at a to b, b to c and c to a.

But I guess that doing two swaps is not the optimal solution for this problem. I would think that implementation would look like:

fn permute3b<E: Copy>(lst: &mut [E], a: usize, b: usize, c: usize) {
    let tmp = lst[c];
    lst[c] = lst[b];
    lst[b] = lst[a];
    lst[a] = tmp;
}

This implementation works for types that are Copy. It would be trivial to extend it to types that are Clone. But, cloning may be expensive, so this implementation may be worse than the one with the two swaps. If I could make it such that it moves values instead of copying, it would work for all types and would be more efficient.

Ideally, the slice type would have this function, and I could define

fn permute3c<E>(lst: &mut [E], a: usize, b: usize, c: usize) {
    lst.permute3(a,b,c);  // calls non existing function
}

Requesting this function seems unreasonable, because then next week someone may request permute4(a,b,c,d), and the week after another one permute5, and so on.

So, is there something more general that I could use to implement all those permutation functions? That would include the existing swap function, which is permute2(a,b).

I thought of a type named ValueMover. It is a mutator which takes a mutable reference to a list. The list is in an inconsistent state during the lifetime of the mutator, but the drop function will make it consistent again. Here is my implementation of ValueMover and of permute3 using it.

struct ValueMover<'a, E> {
    lst: &'a mut [E],
    pos: usize,
    tmp: MaybeUninit<E>,
}

impl<'a, E> ValueMover<'a, E> {
    pub fn take(lst: &mut [E], pos: usize) -> ValueMover<'_, E> {
        assert!(pos != usize::MAX);
        let mut tmp: MaybeUninit<E> = MaybeUninit::uninit();
        unsafe {
            tmp.write(mem::transmute_copy(&mut lst[pos]));
        }
        ValueMover { lst, pos, tmp }
    }
    pub fn move_from(&mut self, new_pos: usize) {
        if !self.is_occupied() {
            panic!("ValueMover::move_from called after release");
        }
        if new_pos == usize::MAX {
            self.panic("ValueMover::move_from may not be called with usize::MAX");
        }
        if new_pos >= self.lst.len() {
            self.panic("ValueMover::move_from called with out of bounds value");
        }
        if self.pos != new_pos {
            unsafe {
                self.lst[self.pos] = mem::transmute_copy(&mut self.lst[new_pos]);
            }
            self.pos = new_pos;
        }
    }
    fn release(&mut self) {
        assert!(self.is_occupied());
        unsafe {
            self.lst[self.pos] = mem::transmute_copy(&mut *self.tmp.as_mut_ptr());
        }
        self.pos = usize::MAX;
    }
    fn panic(&mut self, msg: &str) {
        self.release();
        panic!("{}", msg);
    }
    fn is_occupied(&self) -> bool {
        self.pos != usize::MAX
    }
}

impl<'a, E> Drop for ValueMover<'a, E> {
    fn drop(&mut self) {
        if self.is_occupied() {
            self.release();
        }
    }
}

fn permute3d<E>(lst: &mut [E], a: usize, b: usize, c: usize) {
    let mut mover = ValueMover::take(lst, c);
    mover.move_from(b);
    mover.move_from(a);
}

As you can see, my implementation uses transmute_copy, which is advertised as about the most dangerous function in the standard library. So my main question is: “Is this implementation safe?”. This is my first time I use unsafe, so I am quite uncertain about it.

If it is not safe, I would like to know why and if it can be fixed.

And finally, I would like to know your opinion if you think that such a mutator for slice would be a good addition to the standard library. I mean, that there would be a function take such that I could write:

fn permute3e<E>(lst: &mut [E], a: usize, b: usize, c: usize) {
    let mut mover = lst.take(lst, c);
    mover.move_from(b);
    mover.move_from(a);
}

FYI, you can often improve the performance of code like this and the other cases by adding a bounds check before all the operations:

    assert!(a < lst.len() && b < lst.len() && c < lst.len());
    lst.swap(b, c);
    lst.swap(a, b);

This way, instead of bounds checking before each of the 2 swaps, the condition is checked before any swap happens, so there are more opportunities for the compiler to rearrange instructions within the swap. It also eliminates the distinction between 3 different bounds check failures, reducing the machine code size.

It’s always worth checking whether the simple code will actually be optimized into the better form. But in this case, looking at the assembly output, you seem to be right.

The first thing you should do after writing some unsafe code is test it using Miri. I added a test function,

fn main() {
    let mut data = vec![String::from("a"), String::from("b"), String::from("c")];
    permute3d(&mut data, 0, 1, 2);
    dbg!(data);
}

and ran it using Miri on the Rust Playground, and it immediately found UB:

error: Undefined Behavior: pointer not dereferenceable: alloc492 has been freed, so this pointer is dangling
    --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1637:18
     |
1637 |         unsafe { slice::from_raw_parts(self.as_ptr(), self.len) }
     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc492 was allocated here:
    --> src/main.rs:66:63
     |
  66 |     let mut data = vec![String::from("a"), String::from("b"), String::from("c")];
     |                                                               ^^^^^^^^^^^^^^^^^
help: alloc492 was deallocated here:
    --> src/main.rs:30:17
     |
  30 |                 self.lst[self.pos] = mem::transmute_copy(&mut self.lst[new_pos]);
     |                 ^^^^^^^^^^^^^^^^^^

The problem is that assignment is not a pure move/copy; it drops the value being overwritten, which in this case is your unsafely duplicated String value, resulting in a double free.

However, there’s a simple fix: make your code work on [MaybeUninit<T>], which is logically what it is doing anyway.

struct ValueMover<'a, E> {
    lst: &'a mut [MaybeUninit<E>],
    ...

    pub fn take(lst: &'a mut [E], pos: usize) -> Self {
        assert!(pos != usize::MAX);
        
        // SAFETY: reinterpreting slice as the same type with a transparent wrapper
        let lst: &'a mut [MaybeUninit<E>] = unsafe {
            &mut *(lst as *mut [E] as *mut [MaybeUninit<E>])
        };
        ...

(I also wrote the signature of take in a better way that doesn’t introduce an extra anonymous lifetime, so I could correctly use the named lifetime in the let.)

Just making these changes, by themselves, fixes the reported UB. (That doesn’t mean the code is sound, but at least it succeeds in this case.) But also, now that self.lst has its element type the same as tmp, you can get rid of the transmute_copy operations and write them like this instead:

self.lst[self.pos].write(self.lst[new_pos].assume_init_read());

This is still unsafe code, but its safety considerations are now explainable in terms of the work you actually intend to perform (moving values from a known-initialized place to a known-uninitialized place), without involving any type changing.

1 Like

FYI the stdlib internally has an utility like this: rust/library/alloc/src/collections/binary_heap/mod.rs at 58b4453fbd6f3e88046c3420b18b93c1f093bd12 · rust-lang/rust · GitHub

However a big issue this approach has is that it is unsound to leak this type. If this happens your original slice will end up having a duplicated element where the element that was first moved out should be, and this can of course break safety (e.g. if the duplicated element holds an allocation).

Here's a closure-based version of this API that I believe should be sound Rust Playground

1 Like

Thank you kpreid for finding the flaws in my code and for the fixes and SkiFire13 for pointing to the similar Hole code and for sharing the closure based version.

I don’t think I would have come up with code like

I don’t think I quite understand the remark about the unsoundness of leaking this type. I guess I must try a bit harder (and read more about this subject).

Given the fact that there is already an internal implementation of Hole in the standard library and I reinvented the wheel, I come back to my question: “Wouldn’t it be a good idea to have a function like with_hole in the public interface of slices?” That would have allowed a mere mortal like me to implement permute3 (but also binary_heap) with confidence, without having to use unsafe myself.

I would be interested in knowing if there is a big performance hit in using this abstraction in implementing swap, with respect to the standard implementation

fn my_swap(lst: &mut [E], a: usize, b: usize) {
    hole::with_hole(lst, b, |mut hole| {hole.move_to(a);});
}

With an addition to the standard library, this could look like:

fn my_swap(lst: &mut [E], a: usize, b: usize) {    
    lst.with_hole(b, |mut hole| {hole.move_to(a);});
}

I don't recommend writing code like this anyway. Unfortunately, there aren’t a lot of alternatives for manipulating slice pointers. In particular, I would use pointer::cast, but it does not allow dynamically sized pointee types.

The only way to truly know the answer to this kind of question is to write both versions and benchmark them. You cannot predict all of the behavior of the compiler.

I did not do real benchmarking, but I had a quick look at the assembly output and compared the results when I did a lst.swap(a,b) or a my_swap(lst,a,b). The output has exactly the same length, and glancing at the instructions I had the impression that there was no real difference between them (between the 17 lines that differed). For instance, the first differing lines are movq 64(%rbx), %rax and movq 16(%rbx), %rax respectively. But it is over 30 ago that I did something with assembly, so I don’t feel qualified to judge it.

I used cargo rustc --release -- --emit asm to generate the assembly code.

Hmm, curious. Here's what I tried:

fn permute3m<E>(lst: &mut [E], a: usize, b: usize, c: usize) {
    let Ok([a, b, c]) = lst.get_disjoint_mut([a, b, c]) else { panic!() };
    std::mem::swap(c, b);
    std::mem::swap(b, a);
}

After the bounds checking, that leaves https://rust.godbolt.org/z/M1K1efrdq

_ZN7example9permute3m17h854967b4a0e77d49E.exit:
  %_0.i8.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %i
  %_0.i8.1.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %j
  %_0.i8.2.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %k
  %0 = load i32, ptr %_0.i8.2.i.i.i, align 4
  %1 = load i32, ptr %_0.i8.1.i.i.i, align 4
  store i32 %1, ptr %_0.i8.2.i.i.i, align 4
  store i32 %0, ptr %_0.i8.1.i.i.i, align 4
  %2 = load i32, ptr %_0.i8.i.i.i, align 4
  store i32 %2, ptr %_0.i8.1.i.i.i, align 4
  store i32 %0, ptr %_0.i8.i.i.i, align 4
  ret void

Which is very weird, because that first store to ptr %_0.i8.1.i.i.i is obviously dead, so I would have expected that LLVM would have removed it, but it didn't.

(It's also there in the unsafe version, https://rust.godbolt.org/z/jrh6McKq7.)

This is probably because of a missing noalias attributes (i'm not sure if they even work for return values (edit: it seems that noalias on a return value means something entirely different) ). You could try moving the swap operations into another function that takes 3 independent parameters.

Currently it seems that Llvm has to handle the case that the pointers alias.

1 Like

Good call! https://rust.godbolt.org/z/K6xbqbrT1

_ZN7example9permute3m17h854967b4a0e77d49E.exit:
  %_0.i8.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %i
  %_0.i8.1.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %j
  %_0.i8.2.i.i.i = getelementptr inbounds nuw i32, ptr %x.0, i64 %k
  tail call void @llvm.experimental.noalias.scope.decl(metadata !310)
  tail call void @llvm.experimental.noalias.scope.decl(metadata !314)
  tail call void @llvm.experimental.noalias.scope.decl(metadata !316)
  %0 = load i32, ptr %_0.i8.2.i.i.i, align 4
  %1 = load i32, ptr %_0.i8.1.i.i.i, align 4
  store i32 %1, ptr %_0.i8.2.i.i.i, align 4
  %2 = load i32, ptr %_0.i8.i.i.i, align 4
  store i32 %2, ptr %_0.i8.1.i.i.i, align 4
  store i32 %0, ptr %_0.i8.i.i.i, align 4
  ret void
1 Like

So, if I understand correctly, the solution with using two swaps will lead to optimal code, but only if you add a feature that gives the warning using it is strongly discouraged.

That is not something I would use in code, especially not if it is code that is in a library that I expect others to use.

Could you check what comes out the code of SkiFire13? I might consider using that code. The important thing for me is that the special code in mod hole (using unsafe and attributes that interact with the compiler) is separate from the user code that implements the permute3 function.

Absolutely agreed that rustc_no_mir_inline is not something anyone should put in production code (outside perhaps the standard library).

I was exploring it more with my compiler hat on to attempt to find where the gap was that the obvious code didn't just handle it, since ideally my preference is to change rustc and/or LLVM to have it just work without needing the unsafe dances.

It works nicely if you put an assert at the start, similarly to what get_disjoint_mut is doing: Compiler Explorer If instead you need to support a, b and c potentially being aliasing then it will become quite ugly Compiler Explorer (consider that get_disjoint_mut will panic in that case)

1 Like