Announcing internment, a crate for easy interning of data


#1

I’d like to announce the release of internment, which is a crate that enables very easy interning of data. Data interned with internment will have Hash and Eq implemented at the cost of

Using internship is as simple as:

   let hello = Intern::new("hello");
   let world = Intern::new("world");
   assert_ne!(hello, world);
   println!("The conventional greeting is '{} {}'", hello, world);

In general, you should be able to use an Intern<T> anywhere you would use a Box<T> (unless you want to mutate it), and you should be able to use ArcIntern<T> anywhere you would use an Arc<T> apart from mutating it. In the former case, your data will never be freed.

I’d love to hear feedback! The crate itself is relatively simple, thanks to state. There is some optimization to be done in the implementation (cutting memory use by 1/2), but the API should be stable, apart from adding any missing impls that might be helpful. Actually, I’m open to bikeshedding on the utility methods num_objects_interned and refcount, so to that extent the API is still in flux.


#2

I’m pleased to announce the 0.3.1 release of internment. This release fixes a few issues (e.g. requiring Clone of interned types), and adds a new LocalIntern<T> type which is local to a given thread. This is cheaper to create (no Mutex), and its memory will be freed when the thread exits.

This brings internment to three types:

Intern<T>: Never freed, Copy + Send + Sync, a Mutex is taken on new, but not clone or drop.
LocalIntern<T>: Freed when thread exits, Copy, no Mutex needed
ArcIntern<T>: Freed when reference count is zero, Clone+Send+Sync, a Mutex is taken on new, drop or clone.

I figure at some point I’ll implement an RcIntern<T>, and then this crate will be complete. Once again, any suggestions on the API for internment will be appreciated!


#3

Why ArcIntern requires mutex on clone? Atomic counter should be fine. Same thing for clone, except for the case where number dropped to zero.


#4

You’re right. I was just being sloppy. On the plus side, I can fix that without any API change!

Do you have any idea how much faster an atomic counter is than a mutex? I would imagine the mutex is implemented with an atomic counter in the uncontested case, so the difference would be slight.


#5

Do you have any idea how much faster an atomic counter is than a mutex? I would imagine the mutex is implemented with an atomic counter in the uncontested case, so the difference would be slight.

I have no numbers, but it’s also hard to compare. While mutex is technically an atomic counter on uncontended case, it has at least two atomic operations and a full memory barrier. Whereas Arc uses relaxes memory barriers which means operations can be reordered for optimization (i think compiler can even eliminate subsequent clone and drop operations, but I’m not sure on this one). So my understanding is that actual improvements depend on the surrounding code very much.


#6

@droundy

There is similar mechanizm inside rustc in symbol.rs, is your crate inspried by this code or may be you compare your and rustc implementation?


#7

@Dushistov, my crate isn’t inspired by symbol.rs, although it is indeed implementing the same concept.

The rustc implementation is similar to my LocalIntern, which implements internment that lasts for the thread life. The rustc implementation is limited to strings, and has a more complicated API, possibly to save memory by sometimes using a 32-bit representation for strings, which is smaller than the pointer that LocalIntern has. The smaller 32-bit Symbol is slow to turn into a &str leads to the more complicated API, since rustc also has an equivalent InternedString, which is basically identical to LocalIntern in its use and implementation.