Why are function pointers special? (no null)

I think it's simpler to just think about a platform where code can only be executed from ROM. It's just like data in ROM. If you try to write to it you blow up. If you have a pointer or a reference to a valid function in ROM, calling into it is fine. If you try to copy code from ROM to somewhere else, the arch lets you do that, but then you try to execute the copy, you blow up. The only question is how how safe rust prevents you from doing the things that blow up. You could for example make it so fn() are not Clone, but instead expose a &[u8] that can only be converted back into a fn() type via an operation that is allowed to fail (and would always fail on a platform where all code is in ROM).

That's not actually what that type is. fn(T) -> U is a (known valid) function pointer, so *fn(T) -> U (or precisely, *const fn(T) -> U; there is no unqualified * type) is a pointer to a function pointer, not just a raw function pointer. If you used it you'd have to make sure the fn(T) -> U function pointer existed in memory somewhere so that the *const fn(T) -> U could point to it.

The type you're thinking of in this case does not exist — Rust's type system does not contain any “raw function pointers”, in the sense of pointers not assumed to be valid that point to code and include a function signature in their type. (And, for what it's worth, I do think that is arguably a design mistake, which could have been avoided e.g. with @quinedot's suggestion of fn() meaning the code of a function and &fn() meaning a safe-to-call function pointer.)

7 Likes

Interesting, I've been thinking it's like &'static _. In what ways is it not?

I think function pointers are a little unique in that there are effectively 2 different "load" operations, call vs load instructions. I think for an instructions view you would want a fat pointer, for a call view you can get by with a thin pointer since the function is responsible for detecting its own exit and returning.

I slowly figured that out during this thread and that clarifies a lot :slight_smile:

You can't get at the _, you can't make it a &'local _, you can't coerce it to &Fn(), you can't pass it to

fn foo<F: Fn()>(f: &F) {}

This implementation doesn't apply to it

impl<F: Fn()> MyTrait for &F {}

Things like that.

That's precisely what AVR is. Code is in flash, but you can not just go and write anything there, you need special dance for that. And pointers to flash even have different sizes from pointers to RAM!

Or AArch64 code is in the same address space as data, but you can not even read it in default configuration, only execute!

And on iOS your code is not in ROM but you can not create new code except by signing your binaries on Apple's servers!

I'm not sure attempts to treat code as data would make things “easier to understand”, on the contrary, they may end up making things worse.

We are long past times where Wheeler jump and self-modifying code were the norm, on today's hardware trying to pretend that code and data are interchangeable brings nothing but grief.

As with most Rust's “design mistakes” it's result of the trade-off. And I'm not even sure attempting to “fix it” would clarify more than confuse. It would make certain things easier but certain other things harder, too.

Even on popular platforms there are a difference: far pointers to data don't exist in x86-64 mode, but far pointers to code do exist (and they even may send you from x86-64 mode to 32bit “compatible” mode).

Although I'm not sure if Rust compiler support these.

4 Likes

It's still pretty unavoidable in some areas, e.g. JITs, which Apple only prohibits (except for themselves!) to enforce its monopoly. But even in the AOT world, patching instructions is what a linker does.

1 Like

Even if you are using JIT it's still a bad idea to try to treat code as data. In fact I happen to work on JIT at my $DAYJOB and we are using two mappings of the same memory (one with read+write permissions and the other with read+execute one) to prevent easy-to-exploit write+execute mappings.

Yes, even in today's world on some platforms code and data are interchangeable (x86 is a very popular one), but this is an exception more than the norm (there are more ARM devices in the world today than x86 devices, remember?) and it would be weird to design the language for this exceptional case.

If your language treats data and code as entirely distinct entities you may still run it on platform where they are the same, but opposite is not true thus for portable language it's natural to treat them differently.

2 Likes

Doesn't Rust require a linear memory space, even if non-contiguous? I'm aware there's no official memory model, but I'm pretty sure I've seen that somewhere. The point being that even if code and data pointers are not interchangeable they could still be both pointers. (it's not even that different to current x86 virtual memory protection flags)

Having function types that you need to create references or raw pointers sure seems like it would would have made dylib and JIT a bit nicer due to lifetimes and generic pointer type handling being a bit cleaner, but it doesn't seem like a huge deal (and you could support something like fn() + 'a if you really wanted to add function lifetimes)

2 Likes

No, that's POSIX requirement, but neither C nor Rust require that.

As I have said: it would have made some things easier, some things harder. Most platforms today need you to explicitly synchronise changes to the code (issue some kind of IMB instruction) thus you can not, really, pretend that code and data are one and the same even if language would try to create this illusion.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.