Help getting started with converting c project

tcaputi · November 18, 2018, 7:49am

Hello. I'm a C programmer trying to learn Rust. I've posted here once before after hitting some problems re-implementing a conventional intrusive linked list in Rust. I was quite surprised to find out that this is actually fairly challenging to do. One interesting thing someone mentioned was that Rust programmers typically don't reach for linked lists, instead favoring Vecs. This got me thinking that maybe I just need to try to reset how I think about code when learning Rust. After taking a break for a couple weeks I am trying again, this time attempting to convert a different simple C project, doing my best to think more like a Rust programmer.

That being said, I am now stuck trying to come up with the Rust versions of the structs I need and was wondering if I could get some help. I don't want any help with the method implementations just yet, but I think seeing the struct declarations would help me understand what different Rust types are useful for (particularly the smart pointers, which are a very new concept to me).

For context, this project basically allows me to represent, query, serialize, and unserialize json-like nested data structures (which I call variable structs or vstructs) in C. All entries in a vstruct can be added / removed / looked up via a name string. These are the C structs and enums used (Ignore the prefixes; I really love that rust has namespaces):

typedef enum pil_vstruct_data_type {
     VSTRUCT_TYPE_U8 = 0,
     VSTRUCT_TYPE_I8,
     VSTRUCT_TYPE_U16,
     VSTRUCT_TYPE_I16,
     VSTRUCT_TYPE_U32,
     VSTRUCT_TYPE_I32,
     VSTRUCT_TYPE_U64,
     VSTRUCT_TYPE_I64,
     VSTRUCT_TYPE_STRING,
     VSTRUCT_TYPE_VSTRUCT,
     VSTRUCT_TYPE_U8_ARRAY,
     VSTRUCT_TYPE_I8_ARRAY,
     VSTRUCT_TYPE_U16_ARRAY,
     VSTRUCT_TYPE_I16_ARRAY,
     VSTRUCT_TYPE_U32_ARRAY,
     VSTRUCT_TYPE_I32_ARRAY,
     VSTRUCT_TYPE_U64_ARRAY,
     VSTRUCT_TYPE_I64_ARRAY,
     VSTRUCT_TYPE_STRING_ARRAY,
     VSTRUCT_TYPE_VSTRUCT_ARRAY,
} pil_vstruct_data_type_t;

This enum just represents the data type of an entry in the vstruct. The primitive types are integers and the non-primitive types are strings or nested vstructs. In addition, entries can be arrays of any of these types. I believe I should be able to convert this into a Rust enum such that the underlying data types are included in the enums themselves. I'm also not sure how to declare these enums with slice types for the ARRAY entires.

typedef struct pil_vstruct_entry {
    pil_vstruct_data_type_t ve_type;	/* data type of this entry's value */
    u32 ve_name_size;			/* size of ve_name in bytes */
    u32 ve_nr_data;				/* number of data values (as an array) */
    char *ve_name;				/* name of this entry */
    void *ve_data;				/* pointer to this entry's data */
    pil_rb_node_t ve_link;			/* link into vs_entries */
} pil_vstruct_entry_t;

This struct represents an entry in the vstruct. The key things are:

the entry's data type (from the enum above)
the name and name size
the data (as a void *) and number of data entries (for arrays and strings)

There is also an intrusive link into a red-black tree which I have implemented elsewhere (it could just as easily have been a linked list or other data structure). I don't think I need this in the Rust implementation since the tree be able to do this non-intrusively. I also don't think I need the name and data sizes in Rust because Vecs know their lengths. I'm fairly certain I want this struct to "own" both its name and its data so that they get cleaned up when this struct is destroyed.

typedef struct pil_vstruct {
    struct pil_vstruct *vs_parent;		/* parent vstruct */
    u32 vs_phys_size;			/* packed size of this vstruct */
    pil_rb_tree_t vs_entries;		/* rb_tree of name / value pairs */
} pil_vstruct_t;

This is the vstruct itself (and the only structure referenced externally). This is also where I got really lost. phys_size is meant to be the full size of a buffer required to serialize this vstruct into its binary format. This size includes the size of any internal vstructs. Whenever a user modifies a vstruct that is a child of another vstruct, I need to iterate up the chain of parents and update their sizes as well. Conceptually, I understand that I need a mutable reference to the parent, but the parent needs to "own" the child so I'm not sure how to make this work.

I would also like to replace my custom red-black tree implementation entries for Rust's standard BTreeMap<&str, VstructEntry>. However, whenever I add it to my struct, I get complaints about the lifetime of the &str, which I'm not sure how to resolve.

Any help would be really appreciated. I apologize for the long post, but I wanted to make sure what I have in C was clear. Again, I really would just like to see what the Rust-equivalent struct definitions would look like. Hopefully, from there I can figure out enough to write the rest of the code. Thanks a lot.

ryan · November 18, 2018, 9:36am

For the physical size of the buffer, I wonder, why store it? Is it used a lot? If not, why not just make it a method in the struct impl?

For the map you may want BTreeMap<&'static str, VStructEntry> if you know all of the strings at compile time or BTreeMap<String, VStructEntry> if you do not.

DanielKeep · November 18, 2018, 9:53am

A smart pointer is just a pointer that knows how it's supposed to be used. In C, you can use any given pointer in any way, even in a way the rest of the code doesn't expect. In Rust, you pick the specific kind of pointer that matches how you're going to use it so there's no confusion. The short version:

Type	Owned	Shared	Threadsafe
&T	No	Yes	Yes
&mut T	No	No	Yes
Rc	Yes	Yes	No
Arc	Yes	Yes	Yes
Box	Yes	No	Yes

This is probably the most important thing to sort out, as it decides how the rest of the types are defined. Short version: Rust hates reference cycles, so you're going to have to compromise somewhere.

You could use Rc to share ownership, and the corresponding Weak pointer to avoid creating ownership cycles. But you can't mutate through an Rc, so you need to combine that with some form of locking like RefCell. That means constantly locking as you traverse the tree, and can also impact how you write your code (you can't recursively lock something).

You could remove the parent link entirely and use Box everywhere. To modify, you treat the tree as a single, whole thing and descend into it to mutate it. At that point, you track the "parent" either implicitly in the stack, or with an explicit temporary Vec of parents. But that prevents you from using a direct pointer to the interior of the tree.

As ryan noted, you probably want some kind of owned string type, like String, or Rc<str> (if there is a small set of dynamic strings), or &'static str† (if they're all statically-known).

Anyway, let's assume you're using Rc<RefCell<_>> for things.

struct VStruct {
    parent: Weak<RefCell<VStruct>>,
    phys_size: u32,
    entries: Vec<VStructEntry>,
}

struct VStructEntry {
    name: String,
    data: DataType,
    link: Rc<RefCell<Node>>,
}

enum DataType {
    U8(u8),
    I8(i8),
    // ...
    String(String),
    VStruct(Rc<RefCell<VStruct>>),
    U8Array(Vec<u8>),
    // ...
}

But, again, because I don't know exactly how you're writing this, I don't know if this is right or not. You have to look at what you're doing and decide exactly which trade-offs you're going to make.

†: Technically, &'static str is borrowed, but the 'static lifetime allows it to behave mostly like it's owned. It's borrowed from the executable itself, which lives forever, so there's no lifetime concerns.

tcaputi · November 18, 2018, 5:53pm

I guess I could just recalculate it every time. I mostly wanted to do it this way so I could match the original implementation and so I can lean how to have non-owning pointers to structs.

I do not know all of the strings at compile time, so unfortunately I can't use &'static str. I could use Strings.... Then i guess the entries themselves wouldn't hold the name. I think I'll give that a try.

tcaputi · November 18, 2018, 6:24pm

Thank you for the answer. A few questions about this then:

You could use Rc to share ownership, and the corresponding Weak pointer to avoid creating ownership cycles. But you can’t mutate through an Rc

So Rc allows me to have multiple owners, but none of them can mutate the value? I guess I don't really understand what ownership means then, since I thought ownership implied the ability to mutate. Thinking about it a little more, I see that that doesn't make sense, but I guess I need to figure out what ownership really means in Rust. The Weak pointer does sound like exactly what I'm looking for in terms of ownership.

so you need to combine that with some form of locking like RefCell . That means constantly locking as you traverse the tree, and can also impact how you write your code (you can’t recursively lock something).

I'm a bit confused here. When you say "locking" I think of pthreads and mutexes from C, but I see from the documentation that RefCell is not thread-safe. Even after reading through the docs I'm still pretty confused about what a RefCell actually does.

The structs you provided seem to make sense and hopefully I an build up from there. Do structs in Rust typically have a lot of nested generic types like Weak<RefCell<VStruct>>? That seems like a lot to keep in mind to work with Rust's ownership / mutability rules.

TomP · November 18, 2018, 8:25pm

Others will probably respond much better than I am able. In the meantime, Arc and either Mutex or RwLock are the thread-safe versions of Rc and RefCell. It all does "seem like a lot to keep in mind", but its really just a decoupling of abstractions; the compiler often collapses those sequences of abstractions into safe code that's as efficient as C or assembly.

ryan · November 18, 2018, 10:10pm

Generally, mutability and shared/exclusive owndership is checked at compile time in Rust, but RefCell checks it at run time. Here is a link from the old version of The Book that does a very good job of explaining the various pointer types in Rust and their guarantees and trade-offs.

DanielKeep · November 18, 2018, 11:36pm

Owning something means you're responsible for destroying it.

Mutating something requires that you have unique access to it (i.e. the inverse of the "Shared" column in that table).

The exception to this is "interior mutability", which is effectively the escape hatch for when you need both shared access and mutability. Types that have interior mutability are referred to as "cells" and/or "locks": Cell, RefCell, RwLock, Mutex, etc. It's up to those types to ensure mutation of their interior is done safely, and that's mostly done by some kind of locking to keep other parts of the code from accessing the interior during mutation.

Sure. Weak specifies the ownership semantics you want, RefCell specifies the interior mutability semantics you want. It's like building stuff with LEGO. Sure, it could be a single type, but then you'd need a type for every possible combination of types, which would arguably be a whole lot worse.

tcaputi · November 19, 2018, 5:04am

Thanks a lot everyone for the responses. I will give the implementation another shot and see how far I can get.

Topic		Replies	Views
Reading structures in memory via pointers help	18	11049	January 22, 2020
[FFI] Casting C void* to Rust structure (erratum)	15	6519	January 24, 2021
Trying to compare codes between c++ and rust help	8	3564	January 12, 2023
Complex hierarchy with parent/sibling pointers help	24	2321	March 20, 2023
Data Structures appear to have an extra level of difficulty in Rust, how would you teach them? help	35	7144	September 28, 2021

Help getting started with converting c project

Related topics