Every time I call a function by pointer yes, that would be sane and consistent.
It might help to adjust your taxonomy a bit. *const T
and *mut T
are not Rust's “pointer” types, they are Rust's “raw pointer” types. Raw pointers, references, function pointers, and Box
are all examples of “pointer” types in Rust.
Only raw pointers are unsafe to use and have inherent null values (rather than ones added via Option
).
The argument there is that we should have &fn()
and *const fn()
, but then what is the fn()
type? You don't want it to be the address as that would mean extra indirection. So you want it to be some zero-sized type you can't actually call I guess. Although a bit strange, that would have worked and I think some people think we should have done it that way.
Why didn't it? I dunno, it's been that way long before stabilization -- probably just boils down to safe Rust being the focus and raw pointers the exception.
Would it be a big deal to have to use unsafe
to call fn()
. Yes! A huge deal! That would make it "easier" in some mindset to your FFI (?) case, but that's a very niche case. [1] The non-niche case is that if you have a fn()
, you want to call it. But it would make it much uglier, harder, and denied in many codebases for everyone else.
One of the main points of Rust is to seal away unsafe
ness without a loss of expressiveness and performance. If you start needing unsafe
to do common things, people would just start putting it everywhere and you lose all the (massive) benefit of that isolation. We want common things to be safe and ergonomic.
As for your use case -- do you really want to be checking for NULLs and/or using unsafe
and putting yourself at risk of UB all the time? Closures can be coerced to function pointers if they don't capture, so I don't think it's really that big of a deal anyway. Just define your own "NULL
s".
pub const NULL_FN_NOP: fn() = || {};
pub const NULL_FN_PANIC: fn() = || panic!("Null fn() called");
pub const NULL_FN_UB: unsafe fn() =
|| unsafe { std::hint::unreachable_unchecked() };
-
Also I don't think it really makes it easier; see below. ↩︎
Hmm, let me rephrase because it still seems like there's an inconsistency here.
-
If I want a non-null handle to a valid instance of a type
T
I add an ampersand and get&T
. -
If I want a nullable handle to a possibly invalid instance of a type
T
I add an asterisk and get*T
.
Unless T
is a function, then we go into weird special case land:
- If you want a non-null handle to a valid instance of a function that takes
T
and returnsU
I add an asterisk* fn(T) -> U
.
Does that make it clearer why it's inconsistent? It's like the reference and raw pointer cases are swapped, but only for functions. That we can put functions aside and say "function pointers are not raw pointers" is correct in that clearly the design behaves that way, but it still seems like an odd design decision. You can no longer succinctly describe the difference between pointers and references, now you need to distingiush pointer, function pointer, and ref.
Thanks for the clarification. I think the behavior consistent with data would be:
-
&T
is a non-null handle to a valid function. -
*T
is a nullable handle to a possibly invalid function. -
T
is the actual data -- in this case literally the function's instructions. AnOption<fn()>
would directly store them, so not a ZST.
C/C++ don't let you directly use T
for functions, they only let you use pointers/references. If you do you also have to deal with thorny issues like whether memory is mapped executably (and carry that across move) so you would probably only expose Pin<fn()>
or something.
do you really want to be checking for NULLs and/or using
unsafe
and putting yourself at risk of UB all the time? Closures can be coerced to function pointers if they don't capture
My use case here is just learning the lang is more difficult because it's inconsistent. It's just another special case to remember. I think if functions were consistent with data then closures would coerce to &'static fn()
, which would not require unsafe
to invoke.
TBH, I think the way Rust does fn
is a mistake.
If extern type
had existed in the pre-1.0 days, I suspect that instead of fn(A) -> B
we'd have had the equivalent of that be &'static fn(A) -> B
. That would be nice for dlopen
kinds of things too, since it would allow &'a fn
too.
But as it is right now, just call fn
a "function reference" instead of a "function pointer" and you'll be good to go. After all, they behave like references -- safe to call, can't be null, unsafe to create from bits -- not like pointers -- which are unsafe to call and safe to create from bits.
Interesting, why would extern
have helped? I haven't had to write any FFI code yet so I might be missing context... I've noticed that it looks like extern
and unsafe
are extra properties that functions can have that function pointers preserve, e.g. if you have unsafe fn foo() {}
then a pointer to foo
will be of type *unsafe fn()
. But I'd expect in a world with function references they're still preserved, &unsafe fn()
.
extern type
isn't stable yet. The idea is a type that exists, but its size isn't know, even at runtime.
Because if fn
was just a ZST, then you could do weird things like dereferencing it to get it "by value", but that wouldn't make sense for a function.
But by making the size unknown, all the things like that would necessarily be prevented.
That's true, but no matter what, there's something special to learn about function pointers... and rust pointer/reference types more generally. I think a bigger bummer (having already learned about fn()
) is that you can't go fn()
to &dyn Fn()
, though maybe that could be added.
Why I say there's always something special to learn:
-
What we got
-
fn()
is like a&'static _
in some ways but isn't actually a&_
in other ways
-
-
fn()
is the instructions-
fn()
is a DST so you need indirection,&fn()
, as with&str
and&[_]
- And so
&fn()
is a wide pointer - And so
&fn()
can't be coerced to&Fn()
becausefn()
is already a DST (though maybe this could be special-cased)
- And so
- If we get unsized locals,
fn()
is special cased to still not be movable
-
-
fn()
is a ZST- You can pass it around and it looks like
fn()
but you can't call it - I.e. only useful with indirection (
&fn()
)
- You can pass it around and it looks like
Rust reference/pointer types are more complicated than "these three varieties"
These are all one usize
:
*const u8
&u8
fn()
Well, so are these, but with extra indirection you don't want:
*const fn()
&fn()
But that's still not enough to cover Rust pointers and references, because there's also wide pointers and references. These are all two usize
, and what the second one is varies.
&str
*const str
&dyn Fn()
*const dyn Fn()
- and uncountably more if/when custom DSTs land, perhaps of arbitrary size
And there are special guarantees around niches and enum
s like Option
.
-
Option<&u8>
is oneusize
,NULL
corresponds toNone
-
Option<NonNull<*const u8>>
, similarly -
Option<fn()>
, similarly
Which is why Option<fn()>
is the recommendation for a nullable function pointer.
Explain how would that work on AVR where code and data physically reside in different parts of the chip and not interchangeable even in theory.
Interesting! I'll have to read up on the plan for how those would work.
I think it's simpler to just think about a platform where code can only be executed from ROM. It's just like data in ROM. If you try to write to it you blow up. If you have a pointer or a reference to a valid function in ROM, calling into it is fine. If you try to copy code from ROM to somewhere else, the arch lets you do that, but then you try to execute the copy, you blow up. The only question is how how safe rust prevents you from doing the things that blow up. You could for example make it so fn()
are not Clone
, but instead expose a &[u8]
that can only be converted back into a fn()
type via an operation that is allowed to fail (and would always fail on a platform where all code is in ROM).
That's not actually what that type is. fn(T) -> U
is a (known valid) function pointer, so *fn(T) -> U
(or precisely, *const fn(T) -> U
; there is no unqualified *
type) is a pointer to a function pointer, not just a raw function pointer. If you used it you'd have to make sure the fn(T) -> U
function pointer existed in memory somewhere so that the *const fn(T) -> U
could point to it.
The type you're thinking of in this case does not exist — Rust's type system does not contain any “raw function pointers”, in the sense of pointers not assumed to be valid that point to code and include a function signature in their type. (And, for what it's worth, I do think that is arguably a design mistake, which could have been avoided e.g. with @quinedot's suggestion of fn()
meaning the code of a function and &fn()
meaning a safe-to-call function pointer.)
Interesting, I've been thinking it's like &'static _
. In what ways is it not?
I think function pointers are a little unique in that there are effectively 2 different "load" operations, call vs load instructions. I think for an instructions view you would want a fat pointer, for a call view you can get by with a thin pointer since the function is responsible for detecting its own exit and returning.
I slowly figured that out during this thread and that clarifies a lot
You can't get at the _
, you can't make it a &'local _
, you can't coerce it to &Fn()
, you can't pass it to
fn foo<F: Fn()>(f: &F) {}
This implementation doesn't apply to it
impl<F: Fn()> MyTrait for &F {}
Things like that.
That's precisely what AVR is. Code is in flash, but you can not just go and write anything there, you need special dance for that. And pointers to flash even have different sizes from pointers to RAM!
Or AArch64 code is in the same address space as data, but you can not even read it in default configuration, only execute!
And on iOS your code is not in ROM but you can not create new code except by signing your binaries on Apple's servers!
I'm not sure attempts to treat code as data would make things “easier to understand”, on the contrary, they may end up making things worse.
We are long past times where Wheeler jump and self-modifying code were the norm, on today's hardware trying to pretend that code and data are interchangeable brings nothing but grief.
As with most Rust's “design mistakes” it's result of the trade-off. And I'm not even sure attempting to “fix it” would clarify more than confuse. It would make certain things easier but certain other things harder, too.
Even on popular platforms there are a difference: far pointers to data don't exist in x86-64 mode, but far pointers to code do exist (and they even may send you from x86-64 mode to 32bit “compatible” mode).
Although I'm not sure if Rust compiler support these.
It's still pretty unavoidable in some areas, e.g. JITs, which Apple only prohibits (except for themselves!) to enforce its monopoly. But even in the AOT world, patching instructions is what a linker does.
Even if you are using JIT it's still a bad idea to try to treat code as data. In fact I happen to work on JIT at my $DAYJOB and we are using two mappings of the same memory (one with read+write permissions and the other with read+execute one) to prevent easy-to-exploit write+execute mappings.
Yes, even in today's world on some platforms code and data are interchangeable (x86 is a very popular one), but this is an exception more than the norm (there are more ARM devices in the world today than x86 devices, remember?) and it would be weird to design the language for this exceptional case.
If your language treats data and code as entirely distinct entities you may still run it on platform where they are the same, but opposite is not true thus for portable language it's natural to treat them differently.
Doesn't Rust require a linear memory space, even if non-contiguous? I'm aware there's no official memory model, but I'm pretty sure I've seen that somewhere. The point being that even if code and data pointers are not interchangeable they could still be both pointers. (it's not even that different to current x86 virtual memory protection flags)
Having function types that you need to create references or raw pointers sure seems like it would would have made dylib and JIT a bit nicer due to lifetimes and generic pointer type handling being a bit cleaner, but it doesn't seem like a huge deal (and you could support something like fn() + 'a
if you really wanted to add function lifetimes)