Vec with minimal length


#1

Is there an efficient way to create an array with a minimal length? The goal is to have compile-time checks on the size of the array.

A simple implementation could look like this:

struct OneOrMore<T> {
    first: T,
    rest: Vec<T>
}

Alternatively, one could write:

struct OneOrMore<T>(Vec<T>);

impl<T> OneOrMore<T> {
    pub fn new(first: T, rest: Vec<T>) -> Self {
        let mut v = vec![first];
        v.extend(rest);
        OneOrMore(v)
    }
    pub fn len(&self) -> usize {
        self.0.len()
    }
    ...
}

#2

An enum is probably better:

enum OneOrMore<T> {
   Single(T),
   Multiple(Vec<T>)
}

If you push a new element you’d check what variant you are - if it’s Single, you’d “overflow” into Multiple.


#3

The enum OneOrMore::Multiple can be constructed with a zero-length Vec, so there is no compile-time protection with the enum method.


#4

What’s the benefit of using an enum here?

I think the struct { T, Vec<T> } representation of OP is advantageous in that it has no hidden invariants, but indexing and len are not as efficient. The struct { Vec<T> } quotient representation does have a hidden invariant that needs to be carefully maintained, but it has no extra overhead on indexing or len. So pick your poison I guess?


#5

No heap allocation/pointer indirection in the Single case. More clearly expresses intent? I don’t know what the common case is though: single or multiple elements.


#6

The obvious extension is this, once const generics are implemented:

struct NOrMore<T, const N: usize> {
    first: [T; N],
    rest: Vec<T>,
}

For now you can simulate this for a fixed set of array sizes, using a trait like arrayvec’s Array.


#7

An empty Vec doesn’t require a heap allocation, and the struct doesn’t have any extra indirection when holding a single item.


#8

That was in comparison to the struct OneOrMore<T>(Vec<T>) representation - should’ve made that clearer.

Also, NOrMore<T>(T, Vec<T>) always has inline storage for T + Vec, whereas the storage for the enum is max(T, Vec<T>). But I suspect all of this isn’t really what @vandenoever is after. Is the sole concern to optimize for compile time safety?


#9

If you can get away with it living on the stack, go for it, but it might be more ergonomic to heap allocate it:

struct NOrMore<T, const N: usize> {
    first: Box<[T; N]>,
    rest: Vec<T>,
}

You could also make a macro to define one of these structs whenever you need it:

macro_rules! min_vec {
    ($name:ident, $min_num:expr) => {
        struct $name<T> {
            guaranteed: Box<[T; $min_num]>,
            overflow: Vec<T>
        }
        
        impl<T> $name<T> {
            fn new(guaranteed: [T; $min_num], rest: Vec<T>) -> Self {
                FiveOrMore {
                    guaranteed: Box::new(guaranteed),
                    overflow: rest
                }
            }
            
            fn push(&mut self, elem: T) {
                self.overflow.push(elem)
            }
            
            fn pop(&mut self) -> Option<T> {
                self.overflow.pop()
            }
            
            fn get(&self, idx: usize) -> Option<&T> {
                if idx < $min_num {
                    Some(&(*self.guaranteed)[idx])
                } else {
                    self.overflow.get(idx - $min_num)
                }
            }
        }
    }
}

min_vec!(FiveOrMore, 5);

fn main() {
    let mut five = FiveOrMore::new([1, 2, 3, 4, 5], Vec::new());
    five.push(10);
    assert_eq!(five.get(5).unwrap(), &10);
    assert_eq!(five.get(1).unwrap(), &2);
    assert_eq!(10, five.pop().unwrap());
}

#10

Is the sole concern to optimize for compile time safety?

That’s the initial concern. Since T can be a recursive type (it might hold the NOrMore). I’ll need a Box anyway for the initial items.


#11

So when you say:

Are you referring to literally just the creation process being efficient or subsequent access to the data there being efficient as well? Is this thing modified after construction? Is the common case >1 value in there? I guess what exactly are we optimizing for here?


#12

The length of the Vec will typically be 2-10. Correctness (compile time check) is important as is a correct API that mimics Vec. Speed is nice but not the focus at the moment.


#13

The current use case is to load Relax NG models and create code from them. Relax NG, like XML Schema, can specify arrays with a minimal length. To avoid the creation of objects that do not match the schema, the data types should also have this minimal length.


#14

I’d probably go with the OneOrMore<T>(Vec<T>) layout and the new(first: T, rest: Vec<T>) constructor then.


#15

Oh, right. I understand now.


#16

Prefer an implementation that allows you to take a slice of the whole contents – the enum and the newtyped Vec both do that. With access to slices you get a whole lot of functionality for free, for example the slice iterators.


#17

Although a custom Iterator for OneOrMore(T, Vec<T>) is pretty trivial using iter::once and iter::chain.


#18

Hi, I implemented a small crate ‘seq’ proivding a Sequence type Seq which may be used as cheap stack-only replacement for Vector. The OneOrMore type you sketched is looking similar to my Seq-type (?)
https://crates.io/crates/seq
and