2d table dropping of objects

zeroexcuses · September 4, 2020, 12:18am

Thanks to everyone for discussion in MagicRc, MagicWeak, MagicHashSet , it greatly clarified the question, which I am reposting here.

Suppose we have a 2D table.

In the top row, we have obj-id. In the left colunmn, we have snapshot-id.
Some (not all) cells are filled in. snapshots only contain some objects; objects only appear in some snapshots
when an snapshot-id is dropped, we want to drop everything in the corresponding row
when an obj-id is dropped, we want to drop everything in the corresponding column
EDIT: objects can have different types; all objects within a single column have the same type, but different columns can have different types

How do we do this?

TomP · September 4, 2020, 1:08am

How sparse is the array? 1%? 10%? 25%? 50%? 90%?

Since all objects within a column have the same type, to me that implies that a virtual array (or sparse array) of virtual column vectors (or sparse column vectors) probably would be optimal. The only serious inefficiency with such an approach is if the column vectors are implemented as sparse vectors (i.e., as hash sets or btree sets, etc) and you have to drop a row. Different solutions come to mind, depending on how sparse the column vectors are, as well as the size of the stored objects.

If those stored objects are large and vary significantly in size I probably would store each in a box, probably using an arena allocator. If they're all the same size I might use an ECS-like array and reclaim dropped entries for potential reuse. Without knowing the statistics of your input, nor whether you get it all at once or intermittently over a long period of time, it's difficult to design a high-performance solution.

zeroexcuses · September 4, 2020, 2:10am

I do not understand the focus on sparsity / high performance.

IMHO, the core of the issue is we need a 'table cell' that drops when EITHER snapshot-id drops or obj-id drops.

2e71828 · September 4, 2020, 8:21am

Usually for this sort of thing, you’ll have a main storage that’s organized in row-major or column-major order depending in your primary access pattern which contains owned copies of all the cells. The following assumes you chose row-major; column-major works similarly.

Deleting a row is simply removing it from the top-level of main storage. If the cells implement Drop, the compiler will need to scan the removed row to call its destructors. To remove a column, you need to scan through all of the rows and remove the value individually from each.

If there are Drop handlers, either case will take time proportional to the number of cells being removed, but removing a row will be much faster if there are no destructors.

There are more exotic memory layouts that make different tradeoffs between memory and insert, update, and delete performance. They tend to be more complicated, and are usually best saved for after the simple scheme proves insufficient.

zeroexcuses · September 4, 2020, 9:13am

I think I have an implementation working along these same techniques. I was hoping for something cleaner, but I think a certain amoung of messiness is unavoidable here.

system · December 3, 2020, 9:13am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
2D arrays, lack of help	12	1788	November 26, 2022
Memory-friendly 2d array with variable-length rows	9	402	December 2, 2023
First Rust project - how do I deal with memory leaks? help	12	1991	January 12, 2023
Best way to initialize array of 'drop' objects inplace help	74	4093	June 11, 2022
Hashmap<K, Weak<V>> that auto drops the 0-strong-count Weak<V>'s?	8	346	December 14, 2023

2d table dropping of objects

Related Topics