Announcing iString: a String type with small string optimization


#1

Now there is another string crate (apart form inlinable_string) that inlines small strings directly into the data of the type.

###Important numbers:
inline-capacity on 64bit: 23 bytes
inline-capacity on 32bit: 11 bytes

It uses a few features (untagged_unions, alloc, heap_api, str_mut_extras, inclusive_range) and requires a recent nightly.
It uses the highest bit of the length field to distinguish between the inlined and “normal” state. So the length is limited to isize::MAX (2 GB on 32bit)

It could use more tests and documentation, and definitly needs a few reviews before it is safe to use in critical applications.

Repository
Documentation

Most methods are identical to those from std::String


#2

Hopefully this will be obsolete soon thanks to https://github.com/rust-lang/rust/pull/42859 :

#[cfg(target_pointer_width="64")]
const MAX_CAPACITY: usize = (1 << 63) - 1;
#[cfg(target_pointer_width="32")]
const MAX_CAPACITY: usize = (1 << 31) - 1;

Have you done performance benchmarks replacing Strings with iStrings in one or two projects that use strings a lot?


#3

I should probably look this up myself, but does anyone know offhand if there are any SSO plans for std's String?


#4

No, I have not added a benchmark yet. You are welcome to add one, or two, or three…


#5

Yes, this would make the most sense. This is basically the the pre-RFC for this.
thought it would be better to write an implementation (that is as efficient as possible, without changing the size of the type) and see how much interest there is, before writing such a RFC.


#6

Yeah, definitely - having something tangible to discuss is always better than talking in the abstract. I do, however, recall some github (IIRC) issues where SSO was discussed (and maybe even some proof of concepts were illustrated). I don’t, however, know offhand where those conversations/ideas stand today.


#7

It was discussed multiple times, one of them:


#8

`as_mut_vec’ is indeed not possible, without changing the layout of Vec and making it dependent on the endianness. (the MSB of len needs to be at the end of the struct).

#[cfg(target_endian = "little")]
#[repr(C)]
struct Heap {
    ptr:    *mut u8,
    cap:    usize,
    len:    usize
}
#[cfg(target_endian = "big")]
#[repr(C)]
struct Heap {
    len:    usize,
    ptr:    *mut u8,
    cap:    usize
}

Howerver I don’t see a reason, the layout of Vec could not change.


#9

Right, thanks. I’ll try to catch up on that rather long thread. Looking at the tail of it, though, indicates there’s some resistance to doing SSO for String. Maybe that’s right, I’ve not thought too deeply about it myself, and perhaps reading through that thread will make it clear(er).