What are the practical consequences of DST versus types whose size are known at compile time

This might appear as a noob question - which it is, but I pray, bear with me, I have mostly programmed in languages where I don't need to care that much about memory layouts...

So I have learnt about DST's in Rust, and quoting the reference:

A type with a size that is known only at run-time is called a dynamically sized type (DST ) or, informally, an unsized type

Which I sort of understand. My question now is, more like, okay what does this buy or take from me?

What are the practical repercussion of not knowing the size of a type at compile time? How does it affect a running program etc.

Based on what I know, if i were to answer this question, I will say the difference is that with programs only using types whose size can be known at compile time, one can confidently reason about the upper limit of the memory the program will use. For example if a program only contain this:

fn main() {
    fn return_largest_u8(a: u8, b: u8) -> u8 {
        if a > b {
            a
        } else {
            b
        }
    }

    dbg!(return_largest_u8(1,20));
}

One can confidently say that the memory requirements is mostly determined by the two u8 argument type and one u8 return type. This can be deduced without even running it.

But another similar program that uses DST like this

fn main() {

    fn return_largest_str<'a>(a: &'a str, b: &'a str) -> &'a str {
        if a.len() > b.len() {
            a
        } else {
            b
        }
    }

    dbg!(return_largest_str("1","20"));
}

One can't make such deduction, because the actual str behind the &str which will exit at runtime could be of any size. It could even be on the heap or on the stack. All these cannot be deduced until runtime hence the memory usage of such a program cannot be statically reasoned about.

Will the above submission be true? If not where might I have erred?

Also are there other practical consequences of DST versus sized types than this?

1 Like

"Reasoning about memory use" in this way is not really useful. Practically, how much memory your program uses doesn't depend on the references you have to data, but to the data itself. You could rewrite the above program using only statically-sized values and still not know statically how much memory it will consume:

fn main() {
    fn return_largest_string(a: String, b: String) -> String {
        if a.len() > b.len() {
            a
        } else {
            b
        }
    }

    dbg!(return_largest_string("1".into(), "20".into()));
}

Not a single DST in sight. You could argue that "but the String contains a hidden str", but that's not the point… you could as well call into FFI and do malloc(rand()) in a loop, and eventually exhaust memory. You wouldn't use any DSTs in this way, either, yet you could not say at any point exactly how much memory your program uses.

So what are DSTs useful for, and what consequences do they have?

  • One of the more interesting use cases of DSTs is a trait object (dyn Trait). Trait objects allow dynamic dispatch, but because they are backed by potentially different concrete types, they don't have a single, known size. A dyn Display made from a String has the same size as String, but the same trait object type made from, say a JSON Value has the size of a JSON value (demonstration).
  • A consequence of using DSTs is that not everything works with them. For example, you can't pass DSTs by value and you can't declare a local variable of DST type. Second, you can't coerce a DST into a different DST (they can only be created from sized types). A third, less interesting point is that a pointer to a DST is a "fat pointer", i.e., it contains not only the address of the pointed object but also its length.

There are other use cases for, and restrictions related to, DSTs, but these should already be illustrative.

5 Likes

Seems to me that:

If you have statically sized types, size known at compile time, then the compiler can organise that data into fixed sized memory spaces, be they on the stack or heap. It can generate code to move and copy that data that does not need to count the size at run time. Generating optimal code for the given size.

Conversely, if the compiler does not know the size then it has to generate code to keep track of the actual size that occurs at run time. Code generated for moving/copying such a thing needs to count the items it is working on. More code and a little less performant perhaps.

Example: If the compiler knows an array is of fixed size, say three 32 bit integers, then it can generate three assembler instructions in a sequence to move/copy that array. No loops, no length comparisons, no jumps. But if the array size is unknown at compile time the compiler has to generate code to maintain the current length of the array and the loops required to move/copy it and so on.

All of this has little/no impact on knowing the upper limit of memory a program will use. For example a recursive function could consume all or stack even if only working with fixed sized types. I suspect one could build a tree or other data structure out of fixed sized types with every element on the heap, which could consume all you memory. The programmer has to take care of this.

Some practical effects in Rust are that

  • DSTs must be behind some sort of pointer (Box, &, Arc, ...). (We may get unsized locals some day.)

  • Generic type parameters have an implicit Sized bound... even if the parameter is only used behind some sort of pointer. You can remove it with : ?Sized, but it's easy to forget. So sometimes this limits your ability to call foreign code, and sometimes you'll have go back and loosen your own code with a ?Sized bound (err, implicit-bound-removal).

  • A Sized bound is hackishly used as a NotDyn bound, so sometimes a trait method will be limited to Self: Sized for the sake of making a trait object (dyn) safe, even if the method would be perfectly sensible for other unsized types like [T] or str.

  • Unsized types can not themselves be type erased, so you can't coerce a &str into a &dyn Display for example. Sometimes you can work around this by coercing a &&str into a &dyn Display or similar additional indirection.

  • Pointers can be different sizes, and you can't rely on the layout of wide pointers generally.

  • Due to nuances in dyn vtable generation, pointer equality of dyn pointers is never exactly what you want (compounding other difficulties with the concept due to Rust supporting zero-sized types and other things).

2 Likes

Trivially, with a linked list:

#[derive(Debug)]
struct Link<T> {
    next: Option<Box<Link<T>>>,
    value: T,
}

impl<T> Link<T> {
    fn new(value: T) -> Self {
        Self {
            next: None,
            value,
        }
    }
    
    fn push(self, value: T) -> Self {
        Self {
            next: Some(Box::new(self)),
            value,
        }
    }
}

fn main() {
    let mut link = Link::new(0);
    for value in 1.. {
        link = link.push(value);
    }
    println!("{:#?}", link); // unreachable due to memory exhaustion
}

Well, that line will never be reachable anyways regardless of memory exhaustion because there is an unending for loop right above it. I think a better illustrative example will be to have that loop continue to run and then see what happens to the available memory over time.

It will definitely continue to increase, and at some point, I think the OS will terminate it with a flavor of out of memory exception

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.