Ustr: fast, ffi-friendly string interning

I've made a crate called 'ustr' that does fast, C-compatible string interning: https://github.com/anderslanglands/ustr

I haven't published it yet as it contains a lot of unsafe code and I want to use it in anger and gather some feedback for a while to make sure it's sound before I put it on crates.io.

I wrote it based on OpenImageIO's ustring type https://github.com/OpenImageIO/oiio as I need fast string interning for a renderer project, and I also wanted to be able to share those strings with C without adding all the CString boilerplate at the ffi boundary.

Things still left to do:

  • make sure it's sound.
  • investigate memory usage properties and see if there's any optimizations necessary there.
  • investigate generating Ustrs at compile time and being able to use them transparently.
4 Likes

Am using this for all node/attribute names on a hobby project. Not tested it in anger yet, but it's replacing most places where I would use a '&str'. Thanks for writing it.

Glad you like it! Sounds like your usage is quite similar to mine. I'll be pushing to crates.io before too long.

ah great. I actually created an issue on your github project. There's two things I wanted to ask you about.

  • Adding a serde serializer/deserializer traits.
  • Implementing the Hash trait.

Pretty simple/obvious requests. Am happy to do the work if you agree on how you'd most like it done. serializer traits I think are pretty self explanatory. The Hash trait I wasn't too sure of. See the issue for what I was proposing.

Regarding serde - I assume you don't care if the in-memory representation of the string cache is different after deserialization as long as the set of strings is the same?

personally, I don't mind. I was just thinking convert to and from &str.

Edit:
Just to clarify. I wanted the serialize/deserialize trait impl for Ustr. So a trait that implicitly converts to and from &str. I wasn't concerned with serializing the StringCache contents.

At the moment. I'm wrapping Ustr and calling Ustr::from and Ustr::as_str, but ideally I'd just use a type alias to Ustr.

I've added this on the serde branch on the repo. I'd appreciate it if you'd test it and let me know what you think of the API:


// use ustr::get_cache() to get a reference to the cache to pass to serde

let json = serde_json::to_string(ustr::get_cache()).unwrap();
// the DeserializedCache is a dummy type we need for serde - this will actually just fill the global cache
let _: ustr::DeserializedCache = serde_json::from_str(&json).unwrap();

Ah cool. Thanks!
So just tried building and testing the repo directly, rather than using as a cargo dependency. I'm getting a failing unit test.

User error. Should have read the README properly.