Stable static arena pointer?


#1

I had an idea and am wondering whether it already exists, and if not whether there is a good reason it does not, ie is it a terrible idea?

The idea is this: an allocator that is not unlike an arena allocator, but is 'static so memory can never be freed. The allocated data is immutable, and is stable in the sense that a given data item will be always given the same pointer (on a given run of the program). Obviously this requires a type that is Hash and Eq in order to be efficient. The purpose of all this is that the allocated pointers could be both Eq and Copy, with pointer equality reflecting value equality. This combination seems like it would be very convenient as well as efficient for tree like types that are frequently placed in sets or used as keys for maps. Particularly when there is a lot of redundancy between parts of trees, sticky that the deduplication compensates somewhat for the perpetual leaking of memory.

The implementation would need to use a hash map to keep track of all the allocated data and to ensure that a given value corresponds to a unique pointer.

Thoughts? Does this thing have a name or an existing implementation?


#2

Using constant-sized static symbols to key static data is not a bad idea. “Static key that you can associate arbitrary static data with” sounds pretty similar to an enum with a const data function (https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md).


#3

The data I’m looking at holding, however is not static in the sense of known at compile time, but only 'static in the sense of lifetime, meaning that I will promise never to free it.

So what I’m thinking of will be much more like a normal allocator than what you’re thinking of. There will, however be a strong restriction on what type of data can be allocated, e.g. it must be Eq+Hash and possibly a bit more.


#4

This is usually called interning.


#5

How is what you propose practically different to;

let _one = unsafe { Box::into_raw(Box::new(1)).as_ref::<'static>().unwrap() };

#6

I think I’d go with something like https://docs.rs/slab/0.4.0/slab/ and then just keep the slab alive as needed (rather than require 'static).


#7

Thanks, that exactly describes what I’m looking at doing! Any thoughts as to the wisdom and efficacy of interning? Certainly it relates to the number of unique “things” that will ever be considered, the cost of comparison (and hashing), and the number of times a “thing” will be duplicated… but I would be interested if anyone had experiences with interning to share, particularly beyond string interning.

The practical difference is that it would enable pointer equality and hashing to be semantically equivalent to value equality and hashing, which means that things like a HashMap or HashSet with large keys can be much more efficient. The hashing is computed only when an object is created.

In terms of the runtime representation it would be equivalent, however.

I’ve thought about doing something like that, but then you have a couple of problems. One is that if someone created two different slabs, then they could mistakenly compare pointers into the two of them and get incorrect equality (or inequality). So you’d have a correctness issue. In theory you could solve this issue with some sort of type witness (but that would be extremely hairy). The other problem is that you’d then have to deal with a nasty lifetime programming challenge. I don’t love the idea of leaking memory forever, but it provides a major gain both in correctness and in ease of use relative to a “local” arena allocation approach that could be freed.


#8

That’s certainly a concern if there are lots of slabs around. I suspect you can mitigate this quite a bit with some newtype wrappers though; not sure if that’s what you meant by type witness.

By nasty lifetime you mean making sure you don’t have stale keys laying around if an entry is removed? If your code “promises” to not free then it can also promise to not remove slab entries. Then you can shove a slab(s) into a thread_local (wrapped in a RefCell) or into a lazy_static (wrapped in a Mutex) to get easy access to the slab. I’d personally still try to thread them through the uses, rather than resorting to global-like state, but these are options. One already has to arrange for ownership and lifetime cohesion for all of the Rust code they write, so adding a slab to that list doesn’t seem any more difficult.