Is there a string container that uses a single buffer?

Like std::path::PathBuf does, except the strings inside the container can be any valid strings.

One possible implementation is that the container uses 0xff as delimiter to separate string elements, since UTF-8 encoded strings do not contain 0xff byte.

Also, according to understanding, WTF-8 encoded OsStr also does not contain 0xff byte, so it is possible to have a container to store OsStrs.

The str_stack crate does something like this. Rather than delimiters, it has a Vec<usize> that stores the indices at the boundaries of the elements.

typed-arena can also be used for this if you construct an Arena<u8>. In this case there are no delimiters OR indices; everything is in the pointers you receive from the arena. (however, you are extremely limited in how you can mutate them after they are created)

I think str_stack is a little too heavy, and one extra Vec doubles the size of the object. I don’t need random access to the elements. I could write one myself, but before dong that, I want to know if such a thing already exists.

Unix OsStr can. (Additionally, WTF8 is explicitly considered an implementation detail, so even if true today, it may not be true tomorrow.)

I see, OsStr on Unix platforms uses [u8] instead of WTF-8 encoded slice. So container for OsStrs can not be done. But container for strings should still work.

When I had this problem, I solved it in two ways:

  1. Simple one is to make one String. When adding strings: push_str(s) + push('\0'). Use string.split('\0') to get all the substrings back.

  2. slice-arena — Rust memory management library // Lib.rs

Using String is not general enough since it cannot contain string "\0". slice-arena does not provide a container type that I can iterate through. I think I’ll have to do it my self.

If you have very specific requirements and don't want to pull a big crate, indeed it may be better to just DIY, since the core functionality takes less than 50 lines:

mod lib {
    use ::core::ops::Range;

    #[derive(Default, Debug)]
    pub
    struct StringAccumulator(Vec<u8>);
    
    impl StringAccumulator {
        pub
        fn push (self: &'_ mut Self, s: &'_ str)
          -> Range<usize>
        {
            
            let start = self.0.len() + 1;
            self.0.reserve(s.len() + 1);
            self.0.push(b'\xff');
            self.0.extend_from_slice(s.as_bytes());
            start .. (start + s.len())
        }
        
        pub
        fn get (self: &'_ Self, idx: Range<usize>)
          -> Option<&'_ str>
        {
            ::core::str::from_utf8(
                self.0.get(idx)?
            ).ok()
        }
        
        pub
        fn iter<'iter> (self: &'iter Self)
          -> impl 'iter + Iterator<Item = &'iter str>
        {
            self.0[1 ..].split(|x| x.eq(&b'\xff')).map(|xs| {
                ::core::str::from_utf8(xs)
                    .unwrap()
            })
        }
    }
    
    impl ::core::ops::Index<Range<usize>> for StringAccumulator {
        type Output = str;
        
        fn index (self: &'_ StringAccumulator, idx: Range<usize>)
          -> &'_ str
        {
            self.get(idx)
                .unwrap()
        }
    }
}

fn main ()
{
    use lib::StringAccumulator;

    let mut acc = StringAccumulator::default();
    let s1 = acc.push("Hello");
    acc.push(", ");
    let s2 = acc.push("World\0");
    acc.push("!");
    println!("{}, {}!", &acc[s1], &acc[s2]);
    
    acc.iter().for_each(|s| print!("{}", s));
}
  • Playground

  • It is possible to make .get() and the indexing infallible at runtime, but that requires using generativity, which leads to a way more cumbersome API and is thus rarely worth it.

For the record, csv's StringRecord (and ByteRecord) are also this kind of collection.

1 Like

I am writing one my self: str-list/lib.rs at main · EFanZh/str-list · GitHub, it is basically std::path::Path and std::path::PathBuf with components being any valid strings.

1 Like