Is there a term for this data structure?

Context: implementing chains of lexical environments in a compiler pass

Description: this object is like a Vec, except we are only allowed the following ops:

  • read from any index
  • push to end of object
  • truncate (keep only first N elements)

Before I invent a term for this, trying to figure out if these is standard terminology.

Uhh, this sounds a vector where some operations are disabled?

A Stack is also a vector where some operations are disabled. :slight_smile:

EDIT: maybe original question should have said "object" instead of "data structure", as it's more an object than a data structure

EDIT2: I'm wondering if there is a name for this as I'm sure this comes up all the time in dealing with stacks of lexical environments

I'm not aware of any specific name.

As for stacks, those could also be implemented with a linked list, in which case it wouldn't be a vector. Though I guess you could argue the same here.

Yeah, the point I was trying to make, which in retrospect was not very well stated is:

"X is Y with some operations disabled" does not imply "X does not have a name".

The counter example being "Stack is Vector with some ops disabled."

A Vec (vector, array, dynamic array, ...) is amortized O(1) for appending, O(1) for random access, and O(1) for truncation, which is hard to beat. I can't recall anything else offhand with the same bounds and a different name. You can search for "Brodnik array" for a variation that is optimal on excess space with the same bounds (but a larger constant factor); however I doubt it's worth it for relatively small inputs like those of a compiler.

1 Like

Two people that answered lots of my questions got confused, so I must have written this question poorly. Let me retry.

We have the following:

pub struct Foo<T>(Vec<T>);

impl <T> Index<usize> for Foo<T> { ... }

impl <T> Foo<T> {
  pub fn truncate(&mut self, n: usize) { self.0.truncate(n); }
  pub fn len(&self) { self.0.len(); }
  pub fn push(&mut self, t: T) { self.0.push(t); }
}

Now, this shows up quite a bit in dealing with stacks of lexical envs in transpilers / compilers.

Question: is there a better name for the above than Foo ?

Better than Foo? Most probably yes... Childish joking aside, i am not familiar with all the nomenclature, but context being lexical analysis/processing (and Vec being the superset/template for your restricted/refined data type), something like TokenVec or similar might be usable?

Only if by "amortized" you mean that it won't grow further. If you only add but never remove elements until the Vec gets dropped (which is a common use-case), then I would say it's O(log n) for appending. Which is still pretty good though :sweat_smile: (but might be worth mentioning). And I don't think it could be improved for the data structure needed in the original post.

Edit: I was wrong, see replies below.

(Of course it's also O(1) if you use Vec::with_capacity if you never exceed the capacity… but then it's effectively an array.)

Isn't it just a "stack"? According to my understanding, a stack can include random read access (opposed to a LIFO, last-in first-out). But I'm really not sure.

@quinedot is right on the O(1).

If you insert N=2^k items, there are k=log(N) RESIZES, but they are of size 2^1, 2^2, ..., 2^k. So the work is still O(1).

edit: inserts -> resizes

5 Likes

But the insert (edit: or resize operation, independent of the size) may weigh more than the memory consumption, right? :thinking:

Sorry, I made a typo above, by 'insert' I meant 'resize'.

Each item is:

  • inserted once
  • may be copied up to k times due to resizes
  • but large portions of items are only copied few times (look at items N/2 to N)

Thought experiment: Assume each resize operation requires a robotic arm to move, which takes a ridiculous time of 1 second. Then it's more clear that the Vec insert would be O(log n).

AFAIK, insertion complexity is measured per item, not per batch. That is, every single insert will be, on average, O(log n/n + 1) (where 1 represents some constant time necessary to perform the insert itself), which is clearly O(1), since O(log n/n) = o(1).

In general, I am not in favor of saying "read external link", but in this case: c++ - Amortized analysis of std::vector insertion - Stack Overflow

edit: meant to reply to @jbe

1 Like

Maybe I'm indeed wrong, as eventually the number of items required to be moved will outweigh any constant cost of the resize operation. In that case: sorry for the noise, you may be right!

If you want to be particularly strict about this specific set of operations independently of association, performance, or anything else, I can't think of any standard term. (Also it should probably be a trait, not a data structure.)

There are probably still better names than Foo though. Maybe a Tail, that can truncate, append, Index, and len. Or a Clip that can chop, len, Index, and push. Though I guess at a stretch you could fuse, front, offset, and ordinal.


More seriously, data structures are normally named for their structure, in memory or conceptually (like trees), more than the set of supported operations. But if what matters is a set of operations, you still generally want the data structure that is best for that set of operations. The data structure that is best for the given set of operations is an vector/dynamic array/etc. As a side-effect, I think most people are going to associate that limited set of operations with a vector/dynamic array/etc.

Thus, I suggest FooVec is still somewhat better than Foo. Or maybe something even more use-case specific like @RustyJoeM suggested.


Amortized O(1) when doubling capacity each reallocation

Not a proof, but maybe this will help your [uh, I can only reply to one person at a time, but y'all understand] intuition.

If you have the excess capacity, pushing is O(1). If you don't, # of relocations is the size of your current capacity. Here's the pattern when you double capacity every time:

push # relocations total relocations
2 1 1
3 2 3
4 0 3
5 4 7
6 0 7
7 0 7
8 0 7
9 8 15
10 0 15
... 0 15
16 0 15
17 16 31
... 0 31
32 0 31

Note how when you've pushed 2n items, you've relocated 2n-1 items. And at 2n+1 pushes, you're still within a constant factor (2). Hence, amortized O(1).

4 Likes

This analysis implicitly assumes that the allocator will take no more than O(N) time to find a contiguous allocation that can hold N items. This seems like a safe assumption in practice, but might be problematic for large vectors on constrained hardware.

2 Likes

The number of operations up to size N required for resizing alone is 1 + 2 + 4 + … + N/4 + N/2 + N. That is approximately 2N operations for resizing altogether. Therefore, the cost of resizing amortized over N elements is O(1). It is not log(N). This is a universally-known fact in complexity theory.