I'm trying to write a library that allows users to define custom alphabets of symbols and manipulate strings of them.
For example, having an enum with u8
discriminants:
enum Symbols {
A = 0b01,
B = 0b10,
C = 0b11,
}
This alphabet has 3 symbols and symbols can be encoded with 2-bits, so it can be packed into bitstrings of length 2 * n. The user does this by implementing an Encoding
trait:
trait Encoding {
const WIDTH: usize; // this would be 2 for enums with <= 4 symbols
// ...
}
And the custom string-type, MyString<E: Encoding>
is parameterised for enums that implement this trait:
let s: MyString<Symbols> = MyString::from(vec![A,B,C,A,B,C]);
// which may equal something like 0b011011011011
For the bitstrings that back this type I'm using bitvec
. I trust it to do a good job with the shifting+masking and it abstracts away the endianess stuff:
struct MyString<E: Encoding> {
bv: BitVec,
_p: PhantomData<E>, // zero-cost and necessary, right?
}
These strings could be really long and in this case the bitpacking will save some space, but there's also some succinct datastructures and clever bit-wise tricks to do on short slices of these strings.
Here's a slice type for the custom strings:
struct Slice<'a, E: Encoding> {
bs: &'a BitSlice,
_p: PhantomData<E>,
}
So MyString
/&Slice
here are analogous to String
/&str
. I think the lifetime for my custom slice type is unavoidable, right?
The problem is implementing Index<Range<usize>>
. What I can do is:
impl<E: Encoding> Index<Range<usize>> for MyString<E> {
type Output = BitSlice; // I'd like to return a Slice<'a, E>
fn index(&self, range: Range<usize>) -> &Self::Output {
let s = range.start * E::WIDTH as usize;
let e = range.end * E::WIDTH as usize;
&self.bv[s..e]
}
}
And then implement impl<'a, E: Encoding> From<&'a BitSlice> for Slice<'a, E>
, so I have to slice like:
let s: MyString<Symbols> = MyString::from(vec![A, C, C, C, A, B, C, A, A, A]);
let slice: &Slice<Symbols> = &s[3..6].into();
There are a couple reasons I don't like this. I'd rather:
let slice: &Slice<Symbols> = &s[3..6];
But then index
would have to return a &Slice
, which it can't.
I found a related question here: How to implement a custom slice for a Bitvector : rust though the responses are a little discouraging.
What's the proper way to do this? Is it the unsafe
route?