If you look at Range<u8>, for example, there are only 256×(256+1)÷2 = 0x8080useful ranges, even including things like 0..0 and 10..10, but the type holds 2× u8s for 0x10000 possible values. And of course this just gets worse the bigger the type -- a Range<u32> is "wasting" something well over 8 billion billion values.
Can anyone come up with a clever encoding that would let a CompressedRangeU8 type offer those 32639 niches while still being reasonably efficient?
I guess a small niche isn't that hard if you just do it like IndexRange in core::ops::index_range - Rust and only store canonical ranges. Then you have the 2ⁿ-1 niches from when 255.._ only allowing 255..255.
(Or simpler, since x..0 can only be 0..0, store the start in the low bits and you have a niche for 1..256.)
It's not illegal to make an ops::Range of that, no.
It's also not useful, though, in the sense that you should never store that in a data structure and if you ever try to use it for indexing a slice it'll panic. You should always have start ≤ end, even if the type doesn't prevent you from doing bad things.
Well you need to phrase the type in such a way that rustc could actually take advantage of those invalid values. struct RangeU8(u8, u8); doesn't have any niches because even if you don't think RangeU8(100, 10) is valid, nothing told rustc that.
And today rustc will only take advantage of the largest gap, so to get the full 32639 niches they'll have to be adjacent, too.
The "obvious" way to do that is triangular numbers:
0 is 0..0
1 is 0..1, 2 is 1..1,
3 is 0..2, 4 is 1..2, 5 is 2..2,
6 is 0..3, 7 is 1..3, 8 is 2..3, 9 is 3..3,
# and so on
But while encoding that isn't too bad, decoding it is awkward.
How about something obvious like this (modulo off-by-one errors, overflows, etc. just treat all operations here as mathematical):
let (a, b) = if start <= 128 {
(start, end)
} else {
// (257 - start, end - start)
// Ok, after some thinking this seems to be fine:
(256 - start, 255 - end)
}
and
let (start, end) = if a <= b {
(a, b)
} else {
// (257 - a, 257 - a + b)
// Ok, after some thinking this seems to be fine:
(256 - a, 255 - b)
}
Now it is always a<=128.
But I don't know if it counts as "reasonably efficient"...
If you're willing to multiply you can represent pairs (i, j) with i <= j contiguously. But this is not efficient.
Since it periodically seems to come up I do not want weird indices in Rust to be interpreted in a Python or R way (e.g. "-1 is the last element"). I think 1..0 is empty (and so does Range::is_empty). But sometimes I want to write down empty ranges in this weird way and I'd rather write down the range and ask if it's empty rather than test first.
The new range types -- stable soon™ -- will help with this.
Range in std::range - Rust allows Range { start: 1, end: 0 }, but then it's not itself an iterator, so the conversation to RangeIter in std::range - Rust has the opportunity to normalize things (and could be offered in a Range<T> → Option<RangeIter<T>> form, too).
(255, 0)..=(255, 254) is a decently large niche that the compiler can do useful stuff with and would shake out any improper ecosystem niche assumptions.