Dynamically sized type with str field

Hey everyone to make a long story short I'm attempting to make my own string type. Essentially, what I would like to do if possible in any capacity is to have something like this.

struct Header {
  marked: Cell<bool>,
}

struct Allocation<T: 'static + ?Sized> {
  header: Header,
  T: data
}

// ...
let str_allocation = Box::new(Allocation { 
  Header:  { marked: Cell::new(false) },
  data: // not sure what to do here
});

It's hard to get a definitive answer if it's possible to make a dynamically sized. Most of my googling is also smoked screen by String or &str which is obviously the right call essentially 100%. What I'm working on here is a programming language for fun following crafting interpreters. I'm currently using this type in a relatively simple gc for a dynamically typed language here if anyone is interested.

Currently, I just use String but this hurt performance as I need to do a double deference. I've also seen from profiling that just allocating is a pretty big performance suck so I'm trying to look into strategies to bulk allocate.

So to circle back I'm trying to figure out if it's possible to make an object with a str field any way possible so I can directly have the string data in my Allocation struct instead of in a different allocation from a string.

1 Like

Unsized types more or less can't be created without unsafe code or the unstable CoerceUnsized trait.

1 Like

To reduce a bit the unsafe code involved, you may have a look at @CAD97's ::slice-dst.

The key idea, regarding your not sure what do to here, is that one does not create an unsized type directly: you first create a sized variant, and then coerce a pointer to it into its !Sized variant.

And there is no sized variant for str. So you need to operate on bytes first (sized: [u8; N], unsized, [u8]) and only then can you create a str soundly (with an unsafe transmute of the fat pointer, after ensuring UTF-8 validity and equality of layout of Allocation<[u8]> and Allocation<str>):

extern crate alloc;
use ::alloc::boxed::Box;
use ::core::{cell::Cell, str};

#[derive(Debug)]
struct Header {
    marked: Cell<bool>,
}

#[derive(Debug)]
#[repr(C)] // <- needed for safety of transmute
struct Allocation<T : 'static + ?Sized> {
  header: Header,
  data: T,
}

fn main ()
{
    let allocation: Box<Allocation<[u8; 13]>> = Box::new(Allocation {
        header: Header { marked: false.into() },
        data: *b"Hello, World!",
    });
    let allocation: Box<Allocation<[u8]>> = allocation; // coercion
    let allocation: Box<Allocation<str>> = unsafe {
        // Safety:
        //
        //   - struct is `#[repr(C)]`;
        //
        //   - we guard against UTF-8 invalidity.
        assert!(str::from_utf8(&allocation.data[..]).is_ok());
        ::core::mem::transmute(allocation)
    };
    dbg!(&allocation);
}
5 Likes

slice-dst probably deserves to provide a StrWithHeader<Header> type in addition to the SliceWithHeader<Header, Item>. I can't make any promises, but I should be able to add it soonish.

Though do keep in mind that SliceWithHeader (and the eventual StrWithHeader) also implicitly keep the length of the trailing slice inline to support erasing the pointer to a thin pointer. This is not strictly required, so you would be paying that price even if you don't need it. (Of course, a different impl of SliceDst needn't make the same decision.)

It's funny; basically every time someone links to pointer-utils I find something I want to tweak or change. Thankfully, I've gotten away without breaking backcompat so far. It'd be nice if slice-dst just offered WithHeader<Header, [Item]> and WithHeader<Header, str>, I'm thinking now, but there must be a reason I didn't do that initially... (Probably the fact that new would want to take different shapes for the two.)

1 Like

@Yandros This slice-dst looks pretty promising. It looks like I may be able to use String::into_bytes() and potentially cast it it back to a str later from a [u8]

let allocation = SliceWithHeader::new(Header::default(), String::from("some string").into_bytes());

Thanks again for the suggestions!

Even as is, it will be a huge help for me performance wise as I'll remove a deference. I'd obviously be interested in a StrWithHeader but this project mostly for fun / learning so I''m sure I could take a look at your crate and get my own version of StrWithHeader up and running maybe attempt to make a PR if I can understand what's going on.

slice-dst now has StrWithHeader! :tada:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.