Any good resources for learning Data Oriented Design and Data Driven Programming in Rust?

I'm clueless on what Data Oriented Design and Data Driven Programming is in general so I wanted to learn through Rust. Are there any good videos or docs on this? If you know of any please link them below! Thanks :smile:

2 Likes

Are you referring to Data-driven programming or Data-oriented design?

@skysch Both! Probably should've worded it better I'll make an edit.

I don't understand the question.

All useful computer programs have a data input, a data output, and some processing in between. Which I imagine depends on the data going in and out else it ain't going to work.

Of course we have to consider what that data is and how it is stored and how our machines handle that physical reality if we want an optimal solution.

Is there any other kind of programming other than "data oriented"/"data driven" ?

@ZiCog It's a programming style, there's: functional programming, object oriented programming, procedural programming etc and what I'm asking about is another style of programming.

1 Like

OK.

I think that there in lies my problem.

Back in the day, when I first learned to program it was with BASIC and assembler. A bit later came ALGOL, Coral, PL/M, C, Pascal, Ada....

So basically our data was BYTE, WORD, LONG. Even floats were not usable for performance reasons. And strings, structs and pointers.

For three decades of my career functions were functions, not methods or lambdas. Loops were loops, not iterators or maps... Conditionals were 'if' not some match syntax. etc, etc.

Essentially we worked with what computers actually do: Sequence, selection and iteration.

I was never sold on the OOP idea when it came along with C++, even less so with Java. It did not seem to fit anything I ever needed to do. It went against Occam's Razor by introducing complexity where it was not needed.

Functional Programming is beyond me. An abstraction too far. How would one ever make use of that in an embedded system or where performance matters?

But then... then comes Rust. Which is really stretching my mind because whilst not being OOP or Functional or whatever it is absorbing abstractions and features from both whilst maintaining the performance of C.

Not only that, Rust is the first language I have known in four decades to introduce a genuinely new and valuable feature. Other than syntactic rearrangements and code organization conveniences. That is the whole deal with mutability and checking data aliasing. The infamous "borrow checker". The whole memory/thread safety idea.

I'm not sure yet but I get the feeling that if your write enough Rust code, and have enough conversations with the compiler error messages, especially the borrow checker, you start to get a feel for "data oriented" design.

There is a famous book titled "Data + Algorithms = Programs".

That describes the world of C and the like where there is data over there as a glob and there is program over there as a glob and let's hope they work together.

It ain't so. Data and Program are intimately intertwined. The program defines the data, the data steers the program.

Sorry if this post is long and rambling, these are the that thoughts have been growing in my mind since I discovered Rust.

2 Likes

@ZiCog when I say I'm clueless about Data Oriented Design and Data Driven Programming it means I have 0 idea about what it is and I want to get into game development in Rust and Amethyst seems like a very promising game engine (Which is in Rust) and I saw it says that it's data driven. That's why I'm asking about Data Oriented Design and Data Driven Programming (Specifically in Rust) I get what you mean about the more the code the more I'll understand etc but the concept is super new to me and I want to learn about it through Rust which is why I'm asking for resources on it. How else am I going to learn Data Oriented Design and Data Driven Programming if I have no idea about what it is?

I understand you. I think.

My take on it is that you will never find out what is "Data Oriented Design and Data Driven Programming". In the same way it is impossible to get any agreement on what is Object Oriented Design.

These concepts become far too nebulous. People will write whole books on them that will take longer to read and understand than just doing whatever it is you want to do done.

But hey, you have a language in Rust, you have a game engine in Amethyst.

Why not skip the philosophy, role up your sleeves and set to work making something with them?

At the end of the day it's practical results that count. The learning is in the doing.

I think that is a rusty example of the programming style you are asking about.

@ZiCog Yeah I get what you mean but the style of programming you use in something like game development (And pretty much everything else you write) means everything because it determines how you'll organize everything, call everything, declare everything, etc and there are pros and cons to every style. Just from researching Data Oriented Design and Data Driven Programming I see all these websites and videos of people saying it's the "OOP killer" and that it's perfect for game development etc etc which is why I want to have some basic understanding of that style of coding. I know there are different flavors of every style but there are always similarities between them all (That's why all these flavors are under that specific style) so learning any flavor of Data Oriented Design and Data Driven Programming is all I really want to do so I can draw my own conclusions on how I want to structure everything.

When Amethyst says it employs a data-oriented design, that means they use an Entity component system (ECS) in which game components are laid out next to each other in memory and the systems that modify those components iterate over them linearly. And to be fair, that's about all their is to data-oriented design: ECS engines and SoA layout. There are a few tutorials on ECS design, and the Amethyst book should provide a good primer for how Specs works and why.

4 Likes

@stevensonmt Isn't that kinda procedural programming? I don't really see anything that stands out.

@skysch I'll take a look at it, thanks!

SoA is a big piece of it, because SoA was invented to take advantage of cache locality. Another enormous advantage of ECS in Rust is that it minimizes lifetimes and the resultant effort to satisfy the borrow checker. (People talk about "fighting the borrow checker" but that phraseology omits the fact that the borrow checker ALWAYS wins, at least in UB-free code.)

1 Like

That there is the main idea of data orientation. Correct me if I am wrong.

If you design your program in typical OOP style you end up with lots of bits of data encapsulated in lots of different objects, which are members of lot's of different classes. The result is that the data is now splintered and spread all over memory, which can make it very slow to process as you are causing cache misses all the time. Not only that you have introduced a lot of timing indeterminism as memory needs to be allocated and deallocated, cached and uncached, all the time. Not good if you are in a game rendering loop.

Conversely, in a data oriented approach you arrange to have all the data you need to process often stored together in arrays. Where it can be scanned and processed with minimal cache misses. Much faster, more deterministic.

Possibly the simplest example of data oriented processing is this discussion of traversing large arrays:

You're keeping focused on the whilst maintaining the performance of C objective - it may be wise to manage your expectations a bit. The testimonials I've come across are usually along the lines of "I came for the performance but stayed for the safety".

It seems Rust can be as fast as C but will likely require a different approach (than C) in order to achieve those speeds while still remaining safe. So the first safe implementation of any solution may in fact be disappointing performance-wise.

The whole notion of moving values (optimizable by the compiler) signals a willingess to sacrifice performance for safety. I have been surprised how often std::mem::replace and std::option::Option::take come up.

My take on it is that you will never find out what is "Data Oriented Design and Data Driven Programming". In the same way it is impossible to get any agreement on what is Object Oriented Design.

A lot of OOD talk is informed by the capabilities of OO programming languages. The intention is probably best exemplified by Class-Responsibility-Collaboration (CRC) cards. In terms of implementation the goal was to conceptually colocate cohesive behaviour with its encapsulated, mutable state.

The past few days I've been reading up on Data-Oriented Design, example: Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP). Seems the shape of data is not influenced by colocating all the data (and behaviour) for a single entity (as in OO) but instead physically colocating data that is processed together to support the capabilities of the system in a performant manner (as you have observed in your latest post).

Some also claim that Entity Component Systems are more maintainable.

In some ways Data-Oriented Design seems to mirror emergence of query first schema design in the DB field - trading off "purity of the conceptual representation of the domain" in favour of efficient implementation support for the required capabilities of the system.

In practice, that's a huge part of it.

On a more philosophical level data orientation is about understanding that what you're primarily interested in isn't code, it's the data. The code is there to process data for you, and in order to process the data as fast as possible you need to understand the hardware properly -- you should lay out the data in memory in order to make processing it as efficient as possible for the hardware.

Cache locality is a part of that. But one can envision a platform where that isn't true, and that there are other factors that are more important in order to transform your data efficiently. In such a case data cache locality would no longer be a part of data orientated design [on that platform].

@TomP is spot on: ECS just so happens to be a good paradigm for data oriented design and it happens to - at least in theory - fit nicely with Rust's borrow-checker.

In deed I do.

Probably because I am from a generation where performance was always an issue. The systems I first ever programmed professionally back in the early 1980's employed multiple CPU's and shared memory because one was not enough to get the job done.

Memory of course was in short supply and very expensive.

If one wanted something more high level than assembler then one needed C or Pascal or Coral or Ada, or PL/M....

Turns out, forty years later, the concerns over performance and memory are as vital as they ever were. Whether it being software for the burgeoning embedded/IoT device field or mobile devices or processing in the cloud.

No, I refuse to.

If, for example, Rust provided all the type safety and memory safety it does, whilst at the same time providing the same performance as Java then there would be no point for it to exist at all and I would not be interested.

As a Rust newbie who started out by writing Rust in the style of C, I was was amazed to find that Rust does indeed match the speed of C. Without any 'unsafe'.

With some caveats: Sometime Rust that matches or beats my C equivalents on x86-64 falls way behind on ARM. I have so far found one case where Rust cannot beat C, without resorting to unsafe.

All in all it all looks good, high level language features, type safety, memory safety, for all intents an purposes the performance of C.

Loop blocking is another good suggestion. Correct me if I’m wrong, but as I understand it:

You can organize data into chunks based on the most likely access patterns. If you’re going to be reading a 2D array in a straight line, loop blocking may be harmful. If you reading a 2D array and frequently looking at all the Cartesian neighbors of any given cell, loop blocking may be helpful.

I agree.

It's that "for the hardware" thing that niggles. Our high level languages require us to think is some way that is all about the language and nothing to do with the hardware. After all, half the point is hardware independence and portability. Even C requires us to think in terms of some generalized abstract machine which is nothing like the actual machines we are using today.

Having no notion of caches is one aspect of these HLL abstractions.

No doubt true.

Somewhere though the need for caches is demanded by the laws of physics. Given a huge pile of memory it will occupy space. The speed of light limits how fast we can read and write it from various distances. So even if we maxed out the speed of everything it would still be beneficial to cache frequently used data locally.