Slicing a custom string type backed by bitvectors

I'm trying to write a library that allows users to define custom alphabets of symbols and manipulate strings of them.

For example, having an enum with u8 discriminants:

enum Symbols {
    A = 0b01,
    B = 0b10,
    C = 0b11,
}

This alphabet has 3 symbols and symbols can be encoded with 2-bits, so it can be packed into bitstrings of length 2 * n. The user does this by implementing an Encoding trait:

trait Encoding {
    const WIDTH: usize; // this would be 2 for enums with <= 4 symbols
    // ...
}

And the custom string-type, MyString<E: Encoding> is parameterised for enums that implement this trait:

let s: MyString<Symbols> = MyString::from(vec![A,B,C,A,B,C]);
// which may equal something like 0b011011011011

For the bitstrings that back this type I'm using bitvec. I trust it to do a good job with the shifting+masking and it abstracts away the endianess stuff:

struct MyString<E: Encoding> {
    bv: BitVec,
    _p: PhantomData<E>, // zero-cost and necessary, right?
}

These strings could be really long and in this case the bitpacking will save some space, but there's also some succinct datastructures and clever bit-wise tricks to do on short slices of these strings.

Here's a slice type for the custom strings:

struct Slice<'a, E: Encoding> {
    bs: &'a BitSlice,
    _p: PhantomData<E>,
}

So MyString/&Slice here are analogous to String/&str. I think the lifetime for my custom slice type is unavoidable, right?

The problem is implementing Index<Range<usize>>. What I can do is:

impl<E: Encoding> Index<Range<usize>> for MyString<E> {
    type Output = BitSlice; // I'd like to return a Slice<'a, E>

    fn index(&self, range: Range<usize>) -> &Self::Output {
        let s = range.start * E::WIDTH as usize;
        let e = range.end * E::WIDTH as usize;
        &self.bv[s..e]
    }
}

And then implement impl<'a, E: Encoding> From<&'a BitSlice> for Slice<'a, E>, so I have to slice like:

let s: MyString<Symbols> = MyString::from(vec![A, C, C, C, A, B, C, A, A, A]);
let slice: &Slice<Symbols> = &s[3..6].into();

There are a couple reasons I don't like this. I'd rather:

let slice: &Slice<Symbols> = &s[3..6];

But then index would have to return a &Slice, which it can't.

I found a related question here: How to implement a custom slice for a Bitvector : rust though the responses are a little discouraging.

What's the proper way to do this? Is it the unsafe route?

If you want it to really act like a String/str, your slice type should be unsized like [T] and str and BitSlice are.

#[repr(transparent)]
pub struct Slice<E: Encoding> {
    _p: PhantomData<E>,
    bs: BitSlice,
}

For custom DSTs like this, you still need unsafe to create a reference to your type from a reference to your inner type.

impl<E: Encoding> Index<Range<usize>> for MyString<E> {
    type Output = Slice<E>;
    fn index(&self, range: Range<usize>) -> &Self::Output {
        let s = range.start * E::WIDTH as usize;
        let e = range.end * E::WIDTH as usize;
        let bs = &self.bv[s..e] as *const BitSlice as *const Slice<E>;
        // Safety: repr(transparent)
        unsafe { &*bs }
    }
}

Now that Slice<E> is unsized (does not imlement Sized), you might have to add ?Sized to various type parameters.

3 Likes

Thank you so much! Everything makes sense now! :heart:

Is the // Safety: repr(transparent) comment a convention for annotating unsafe code?

Yes, having a comment documenting what assumptions your unsafe code is making helps avoid confusion later