Modelling data structures and the constant need to rewrite them

Hi!

I am new to rust and I find myself quite often in the following situation, raising the same question again and again:

The question has two parts, but they are closely related, therefore, I bundle them into one post.

Part #1 (practical problem)

Often, my data modelling ends up with nested structs that make logically sense (to me). Maybe because I am still mentally bound to the OO world.
The root cause of the following problem is the fact that child methods need to call parent methods (sometimes mutable and sometimes immutable), but Rust does not like this approach (borrow checker issues). Here is an example to illustrate my problem:

Code example
#[derive(Debug)]
struct Child {
    child_counter: u8,
}

#[derive(Debug)]
struct Parent {
    child: Child,
    parent_counter: u8,
}

trait ChildApi {
    fn increment_child(&mut self);
    fn increment_parent<P: ParentApi>(&self, parent: &mut P);
}

impl ChildApi for Child {
    fn increment_child(&mut self) {
        self.child_counter += 1;
    }

    fn increment_parent<P: ParentApi>(&self, parent: &mut P) {
        parent.increment_parent();
    }
}

trait ParentApi {
    fn increment_parent(&mut self);
}

impl ParentApi for Parent {
    fn increment_parent(&mut self) {
        self.parent_counter += 1;
    }
}

fn main() {
    let mut parent = Parent {
        parent_counter: 0,
        child: Child { child_counter: 0 },
    };

    parent.increment_parent();

    let parent = &mut parent;

    parent.child.increment_child();
    parent.child.increment_parent(parent); // <---- not possible

}

What is the best way to solve this issue, how should data be structured to prevent borrow checker issues?

I did a lot of research and all I could find boils down to one of the following approaches:

  1. Use interior mutability in some way (I would like to not do this, because I would like to learn how to do it properly and I am lacking experience to fully understand the possible pitfalls)
  2. Structure the data like an ECS. This seems like an overkill for my small cli tools.

Bonus question: Is there a clean way to make my snippet work?

Part #2 (architectural problem)

I have made the observation that the following scenario happens quite often in Rust (more than in other languages):

One takes a lot of care to design the model of an application and everything works after the initial implementation. But then requirements change and new features shall be added. The data model fits less and less and new borrow checking issues arise, requiring the rewrite of a lot of code to adapt to an updated data model. How to prevent this, is it even possible?

I am aware that such adaptations are normal and required in other languages, too. But my impression is that Rust especially, requires more reimplementation than others.

Don't get me wrong, I do not blame Rust. I am fully aware of the fact that all problems might be/are related to my lack of experience :slight_smile: .

Are there general design concepts for Rust to reduce the amount of refactoring? All I could find tackles only the most basic scenarios...

Thank you all!

BR,
thedude

1 Like

I think this is an exemplar case of "creating a parallel type hierarchy based on domain model". I strongly recommend to watch casey's talk "the big oops" when you have the time.

it's hard to give real advice based on contrived example, but I'll give it a try.

when you use terms like "parent" and "child", its already a big red sign to me. this is especially problematic if the API requires the parent knows about the child, and/or the child knows about the parent. if they are mostly accessed together, what's the point to split into two types? even worse, why assign the role of "parent" and "child" to the data?

data structure design should, for the most part, be based on the access pattern in the real program, not some metaphorical "relationship" in the domain model.

another problem is unnecessary abstraction. separation of interface and implementation is a good principle, but what is abstraction? if the interface has a method named increment(), I think it should be clear an integer "counter" of some for is implied, then I would just make the counter as part of the public API and let the user increment it directly.

this also reminds me Linus's recent rant (LKML archive): people can already understand the expression a + b << 16. the "helper" function make_u32_from_two_u16() is not only pointless, but it actually hurts readability and creates a cognition burden for code reviewers: you can't immediately figure out by looking at make_u32_from_two_u16(a, b) which is the higher half and which is the lower half.

I think many times, people make these "abstractions" just out of habit without thinking too much about it. but once you think about it, you can't make a lot sense of them.

although it's a contrived example, just for completeness, here's how I prefer to do it:

struct AppData {
	parent_counter: u8,
	child_counter: u8,
}

fn main() {
	let mut data = AppData {
		parent_counter: 0,
		child_counter: 0,
	};
	// increment parent
	data.parent_counter += 1;
	// increment child
	data.child_counter += 1;
}

alternatively:

fn main() {
	let mut data = AppData {
		parent_counter: 0,
		child_counter: 0,
	};
	// use parent api
	parent_api_increment(&mut data.parent_counter);
	// use child api
	child_api_increment(&mut data.child_counter);
}

fn parent_api_increment(counter: &mut u8) {
	todo!();
}

fn child_api_increment(counter: &mut u8) {
	todo!();
}

btw, the term shared reference and exclusive reference should be preferred over immutable reference and mutable reference, because they better reflect how the compiler are actually checking them, which helps you grasp the borrow checker more quickly. also, they align more closely with the idea that data structures (and APIs) should refect the access pattern of data.

once you think &mut as exclusive reference, you should realize methods with &mut self have much stronger requirement: you are claiming exlusivity of the entire data type, which can be particularly troublesome if you are used to aggregating pieces of (child) data into larger (parent) structs for no benefit.

rust has support for disjoint borrows for very long. unfortunately, it doesn't work across function boundaries! be very cautious when you design APIs that uses &mut self for potentially large data structure.

for example, if your API is just to increase a counter, why do you need exclusive access to the entire data structure (&mut Self) containing the counter? you should only need to borrow the counter!

I often would like to say, get comfortable to functions with many arguments. to me, I would prefer a type is small but its methods may have, say, up to 6 to 8 arguments, other than if a type is very large, just to keep all its methods uses, say, 2 or 3 arguments, while each method accesses different subsets of "fields" on self as implicit "arguments".

2 Likes

While nested structures are good to represent that "something is inside something" (like coordinates may be a field of another structure, and hold x and y), the simplest seems to put all methods on the parent that can access its own substructures and their fields without a problem. So child_api_increment should be called on the parent, passing the child id.

Otherwise, you are building a graph, and the graph is a very problematic structure for ownership (if parent knows about the child and child about the parent, who owns? Nodes are "owned" by the graph itself, not one by another.

I don't find that obvious. I'd prefer at least the use of parentheses since I don't have to think about the order of operations—forcing me to have to think about the order of operations is undue cognitive burden itself the very thing you're trying to avoid with make_u32_from_two_u16—furthermore assuming a and b are u16s, one should likely do (u32::from(a) << 16) + u32::from(b) (or using as: ((a as u32) << 16) + (b as u32)) seeing how a + b can overflow.

I understand what you're saying, but the compiler itself uses "mutable"/"immutable" as do Rustaceans who know a lot about the language (including members of the language team). When you're first learning Rust or teaching someone, then I agree—then again, I think one should avoid lifetime elision and type inference until they get comfortable—but using "imprecise" terminology and even confusing terminology to someone less knowledgeable is all but unavoidable at some point even in fields like math. It's "safer" to use "exclusive"/"shared" just in case your audience isn't familiar enough, but it's important to be familiar with the terms "mutable"/"immutable" in the context of Rust because you will see it.

And relatedly, the shared/immutable reference &T is a much weaker requirement: you can't perform simple algebraic reasoning since the underlying value may in fact be mutated due to interior mutability.

Thanks for your answers. I have watched parts of "the big oops", interesting history session with limited technical information, as far as I can tell. However,
@nerditation

data structure design should, for the most part, be based on the access pattern in the real program, not some metaphorical "relationship" in the domain model.

seems to be a good point to think about! I really appreciate your effort to modify my code example, but I think you are right, this generic example is not really suitable, because your modification seems obvious in such a simple example.

@bourumir-wyngs
If I group all methods on the parent, I ease access to the fields but wouldn't this make problems regarding separation of concerns? I think the separation of unrelated things is not only in OO a good idea, or not?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.