[unsafe] Enum containing C-style DST

Hello

I'm trying to write this data structure: Ctrie - Wikipedia. However, I'm having some problems with part of the data structure.

It is built of nodes of several kinds:

  • Leaf node, containing a key-value pair.
  • Inner node that contains a bitmap field and then an array of as many atomic pointers as there are ones in the bit field (the bit field doesn't change during the lifetime of the node, so the node doesn't have to shrink or grow).
  • Some other service node types that are not interesting right now.

A natural representation would be something like this:

enum Node<K, V> {
  Leaf(K, V),
  Inner(u32, [AtomicPtr<Node<K, V>>]),
}

Now, there are some problems with this representation:

  • I already know the size of the array, so I don't need to waste another machine word to store it once more.
  • Enums don't like to be DSTs/unsized, only structs, so this can't be used like this.
  • Pointers to DSTs are fat, therefore they can't be used as atomic pointers.

So I'm looking for something that is more like C-style DST. In C I can create a struct that has variable „tail“ and give it enough room by asking malloc to give me more memory. Is there a sane way to do something like this in Rust?

What I've come up with so far is this beast:

enum Tag {
  Leaf,
  Inner,
}

struct Leaf<K, V>(K, V);

#[repr(C)]
struct Inner<K, V> {
    bits: u32,
    ptrs: [AtomicPtr<Node<K, V>>; 0],
}

#[repr(C)]
union Tail<K, V> {
    leaf: Leaf<K, V>,
    inner: Inner<K, V>,
}

#[repr(C)]
struct Node<K, V> {
  tag: Tag,
  tail: Tail<K, V>,
}

And somehow put this structure together by… um… transmuting things around and moving bytes. And then, when dropping, having to somehow manually transmute it back and stuff so the alocator is told the proper size to free. But that feels very non-Rusty, error prone and tedious.

Is there a better way I'm not seeing (for either creating such structure or solving the problem in some completely different way)? I'm fine with unsafe for this (I expect it'll be needed when doing such crazy things).

Do you need a DST? I might have misread the Wikipedia article you linked, but don't you want a fixed number of pointers per node? So you could do [_; 32] and use Option for missing nodes.

You could also use SmallVec instead of a slice or array.

I use this now, for the first attempts to play with it. However, the paper linked there stresses the benefits of using only as many fields as necessary to save space. Intuitively, I think most of the fields will be empty most of the time so not storing all those NULLs might have some impact.

Part of why I ask is, I want to know if Rust allows implementing the paper as it is.

Originally, I didn't want to link the full paper here, but if anyone is interested or if Wikipedia is not enough: http://infoscience.epfl.ch/record/166908/files/ctries-techreport.pdf

Thanks :+1: It does another pointer to traverse if the node becomes large, but I think it's the next best thing after doing it the fully dynamic way and definitely much less work & complexity.

If you know an upper bound, which you do in this case, then it won't overflow.

If not, then you could also use arrayvec

Hopefully this is useful to you:

struct Node<T: ?Sized> {
    id: i32,
    data: T,
}

fn for_two() -> Box<Node<[u8]>> {
    Box::new(Node {
        id: 2,
        data: [0, 1],
    })
}

fn for_eight() -> Box<Node<[u8]>> {
    Box::new(Node {
        id: 8,
        data: [0, 1, 2, 3, 4, 5, 6, 7],
    })
}

fn example(node: &Node<[u8]>) {
    println!("{}: {}", node.id, node.data.iter().sum::<u8>())
}

fn main() {
    let a = for_two();
    let b = for_eight();

    example(&a);
    example(&b);
}

(Playground)

Output:

2: 1
8: 28

But then, if I don't let it overflow for large sizes, it'll waste space for small sizes. The whole point of this particular trick with the bitmap in the paper is to allocate only as many pointers as needed.

Unfortunately, no. That Box and & are true Rust DSTs, which means fat pointers. Which makes it impossible to use in AtomicPtr :-(.

I don't think you can do this without custom DSTs,
https://github.com/rust-lang/rfcs/pull/2594

1 Like

Another option could be to double box, that way you get a thin pointer which can be used with AtomicPtr

Thanks for the tips. Seems like there's no obvious best solution but there are some options… I'll see what from these will work eventually :-).

Yep, but it seems to work:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.