Is it enough to disable access to statics to have pure functions?

(That’s a theoretical question about language design, I hope it’s the right place to ask it)

I was reading this blog post about adding contexts and capabilities in Rust.

If there was an attributes #[pure] that we could add to functions, that would remove the capability to access to static, thread_locals and calls to functions not annotated with this attributes, would it be enough to make the function pure, or do I miss something?

And If this #[pure] attribute works as I expect, would the proposal in the blog post make it realistic to have most functions pure? And things like being able to to use println!() allowed with explicit with clause (possibly at the module level)?

2 Likes

I would generally expect “pure” to imply that a function is idempotent so you’d need to somehow restrict mutating through arguments as well, whether that’s via explicit &muts or interior mutability.

1 Like

What can make a function non-idempotent if it cannot access statics and thread_locals?

If you were talking about capabilities, I consider them part of the list of arguments (even if they are implicit) since the prototype of the function require them (and requires an &mut to modify their state).

To be clear, this looks much too easy, so I sincerely assume that I’m missing something.

An increment function is non-idempotent, for example, because the number of times it’s called will affect how the program proceeds:

fn inc(x: &mut i32) { *x += 1; }
3 Likes

For a starter, any function doing IO is probably not idempotent.

6 Likes

I would consider closures "impure" if they capture a stack reference or ref-counted shared state with any flavor of mutability.

Not sure if that is exactly what you mean by "pure functions". But it seems pretty clear to me that there are several ways to share mutable state in Rust without static allocation.

So no syscalls, no memory allocation, no floating-point operations? I doubt it would be a a useful concept.

The general idea is that “pure” computational functions have the same properties as mathematical functions. In terms of utility, it mostly means that the optimizer is free to add, remove, and reorder calls to pure functions however it sees fit.

Thanks everyone for the answers.

For some reason I thought that IO was only modifying some global state, and not calling syscall. Indeed syscall, and floating point operation have side effects.

It seems that my understanding of the word “pure” was improper. I thought that pure function meant “does not produce side effects outside of modifying their inputs”. Such function should thus produce the same result if their inputs are in the exact same state. If my understanding is correct, those functions don’t need to be idempotent. inc would satisfy this definition.

Is there a word to talk about this concept?

I may be wrong, but I thought that memory allocation does a combination of syscall and/or modifying global state. If syscall are disabled, but the global state of the allocator passed either explicitly or with a with clause, I assume that you can totally do memory allocation, or do I miss something?

I doubt it would be a a useful concept.

With the proper term and not “pure” as I incorrectly started this discussion, it should be a much more useful concept. Being able to know that the only state that a function can modify is what is passed as &mut in its input (including implicit attributes) would be very useful.

I also assume that this concept could be easily extended to any kind of side effects (syscall, floating point operations, …) by having so kind of magic token that your need to pass (with an &mut reference) to explicitly allow the exact subset of side effects that you need (instead of every side effects as its possible in Rust currently).

That being said, I realize that Rust has interior mutability. Does this break the whole idea of “I want to be able to explicitly list everything that this function can mutate”?

(I didn't review the blog post.)

You have to also ban interior mutability to have a pure function, and Rust has no way to do that generically.

You may hear about a Freeze trait and think that's the answer, but it's not. The Freeze trait is shallow, so Box<Cell<T>>: Freeze.[1] Additionally, something like a NewType(usize) can also indirectly feature interior mutability/impure methods (via globals, I/O, representing a pointer, etc).[2] So type-wise, if you want to ban interior mutability, you need to know not only the fully resolved type and all its fields, but all of that type's implementations (and transitively across the fields).

Thus you would need something more infectious, which generally leads to effect systems or the like.


  1. That trait is more about "is it ok to put this in read-only memory". ↩︎

  2. As a concrete example, an OwnedFd is ultimately a c_int, but you can use it to perform I/O. ↩︎

1 Like

Thanks. I think this is what I missed since the beginning (as I said I was quite certain that I missed something with deep implications, it felt way to simple and someone would have done it before). Shallow-constness is maybe the only think that I still have a hard time wrapping my head around coming from C++ (were constness is deep, but does not include concurrent operations). And here deep-constness would have been useful, but indeed, that’s much harder to add (and most probably not worth the trade-off).

I’m still interested to know if there is a name for “no side effect except modifying the input arguments”.

Many allocators don't allow you to temporarily disable syscalls. Those that do will have the issue of not being able to obtain more memory when the available one is no longer enough to satisfy the allocation requests.

Moreover the allocator state is likely so complex that passing it as a parameter to a pure function means you will never be able to call that pure function with the same parameters agains, due to the allocator never having the same state again. At this point you might as well allow syscalls again.

It doesn't really matter that allocators have a complex internal state. The propery I'm interested by is "I want to make global state modification explicit", not prevent it.

Indeed. The "some global state" you mention there is everything outside your program, outside your computer even, essentially everything in the universe that is not your program.

Syscalls is not the main point, one could be reading writing input/output registers directly.

I'm not sure what you mean. If the function inc(x) always returns a value one more than x I would think it is pure. But if your function is just inc() which produces a value that is one bigger than it was last time it was called it is not pure

I have always felt that if I have some function that always returns the same result given the same input then it really does not matter if that function also does some output. Sure doing the output changes the state of the outside world but my program knows nothing about that and the function may as well be regarded as pure. I'm sure others will have reason to disagree with me there.

I'm not sure why you want to disable syscalls but allow global state then.

This really depends on how you consider a &mut T parameter. Is it the same if it points at the same location, or if the pointed object is also transitively the same?

I believe there are many different possible subtle flavors of “pureness” concepts; the basic idea is often that pure functions must resemble mathematical functions in some manner, specifically that the output only depend on the inputs. Additionally, there’s often the desire to have additional limitations on “side effects” in general.

The devil is in the details though.

I’m pretty familiar with Haskell’s notions of pureness, which helps a bit, but also Rust is quite different in many ways.

For example Rust’s types can have ownership. With a non-Copy type, it’s less trivial to ask the meaning of “function return values are identical for identical arguments”[1] – though to be fair, the question of what return values count as “identical” is nontrivial in any case.

Also, memory allocations are much more deeply integrated with the language in Haskell than Rust. The former is garbage-collected and every data structure implicitly boxed, so of course “pure” functions can make full use of allocations in Haskell, but it can be a discussion point for any notion of “pureness” in Rust.

The design of “pureness” system also relates to its goals&use-case. One often-quoted usage is “compiler optimization” so you can e.g. eliminate repeated calls to the same function.[2] More freedom for reordering evaluation could also be a goal. IMO most of these ideas are not a great motivation for a language-wide pureness system though.

Many people are interested in pureness for easier verification of correctness. It’s a way to restrict mutability, which restricts shared mutability. Rust already is good at restricting shared mutability; comparable to functional programming languages that are only “pure by convention”, i.e. highly favor purely functional APIs, but still don’t explicitly mark impure functions after all.

Rust is also different though in that it also embraces a strict type system and encoding much useful information in function signatures already; so desire for some form of annotating “effects” that are currently hidden from function signatures comes up regularly, unsurprisingly.

One use-case of “pure” functions that I’ve seen at least a few times was around unsafe code. Sometimes it’s just very useful to have a guarantee that – say – some user-defined fn(usize) -> (usize) behaves like a mathematical function.[3]

Notably Haskell’s notion of pureness works similarly to safe code in Rust. You can use “unsafe” API to execute impure code in a pure context, and this can be useful in order to represent an overall-pure operation that relies on (and properly encapsulares) impure implementation details, e.g. for performance reasons.


So let’s outline how to begin a pureness system. This should be somewhat familiar to people who know Rust’s existing notions of safety and soundness already.

We would start by outlining some minimal goals; e.g. clearly

  • a pure function with signature fn() -> bool either always return true or always return false
  • if all you do in a function is “compose” other pure operation, then the overall result is pure[4]

next, we can expand the system in two main ways:

  1. Declare more fundamental operations as pure
  2. Define more properties that we expect from pure functions

That’s where the design decisions will eventually come into play. The property that composing pure functions makes more pure functions means we can think about “soundness” again. We can only define a set of fundamental operations as pure that’s overall sound, i.e. can’t be used to define[5] other pure functions that violate the stated property about fn() -> bool, or any additional properties that we might have defined.

Just like it already sometimes happens in Rust, it can be the case that one operation and another operation both can be sound individually, but no longer if taken together:

For example, one could consider Box::<u8>::new a pure operation, or inspecting the address of a box[6] could considered pure; but when both are taken together, one could make a pure function that generates an address (usize) value which no longer depends on the function inputs.[7].

The decision here is usually arbitrary. One factor to keep in mind though can be the goal to “make most functions pure” – with that in mind one would likely rather allow allocation and deallocation in pure functions and disallow inspection of pointer addresses, because the former is far more common (and the latter is, if at all, often done in unproblematic manners which could then unsafely be marked to be pure, anyways).

Similar considerations would apply to &mut T-usage. We could think about going very far in the additional “properties that we expect from pure functions” that we define. For instance, if we want to express in some way that with a pure function f, calling f(x) twice should return the same result, we could so far as requiring the same thing even for x: &mut T[8]. Of course then, a function f that actually modifies the T behind the reference can no longer be pure; two calls to a fn(&mut T) -> bool function that inspects and modifies the T will easily return different results.

In my opinion however, it would be much more useful to consider how similar say fn(&mut T) -> bool is to fn(T) -> (T, bool), and allow mutation through explicitly passed unique references. The intuition that “function return values are identical for identical arguments” can still apply to functions that don’t handle arguments containing mutable references. fn(&mut T) -> bool isn’t directly reflecting the signature of a “mathematical function” but that’s not actually problematic.[9]

That being said, most forms of interior mutability / shared mutability are likely too hard to encapsulate properly and can’t be allowed in pure functions.[10]

Even global state modification can be quite desirable in pure functions, if properly encapsulated. For example a pure function might want to make use of a lazy global regex; the first time it’s called, it would also compiler the regex, which does update global state. As long as no-one can “notice” this change, there should be no issue with it.

Most importantly of course, other pure functions must not be able to notice any of the “not-quite-side-effects” that a pure function is allowed to have, otherwise you would be able to define an impure fn() -> bool.

But additional restrictions on side-effects are desirable, too. For instance… of course functions can’t read files, because if you read a file twice, the contents may change; but creating/writing a file won’t be an observable effect to other pure functions. Nonetheless, accessing the file system seems so clearly impure that it’s likely something a “pure” function should not be allowed to do.


  1. which was the phrasing from Wikipedia ↩︎

  2. Though to be fair, a too simplistic “common subexpression elimination” isn’t always a good optimization necessarily; even the Haskell will actually not do that in most cases. Similarly, no implicit “memoization” will happen for functions. ↩︎

  3. It’s not super often though; I think maybe I should start compiling a list of concrete cases somewhere. One example was around pure Hash&Eq implementations; another interesting thing could be “pure” (&[mut] Foo) -> &[mut] Bar projections.

    For instance some ideas around a DerefMove trait involve a (*mut Foo) -> *mut Bar method that mimicks field-projection and must always point to the same Bar when called more than once; wouldn’t it be neat if a combination of effects and view-types could eliminate the need for unsafe altogether? ↩︎

  4. similarly to how a function that only uses safe operations in Rust must itself be safe ↩︎

  5. in safe Rust ↩︎

  6. like &*x as &u8 as *const u8 as usize ↩︎

  7. Assuming e.g. == comparison on usize is also pure, this gives rise to a fn() -> bool violating the first property ↩︎

  8. the variable x would then be implicitly re-borrowed ↩︎

  9. It can still be modeled e.g. as something similar to fn(T) -> (T, bool); admitted, as soon as we also start returning (or nesting) mutable references, the mathematical analogues can get a lot more complex. ↩︎

  10. This isn’t quite as clear as it might first seem though. In a single-threaded environment even (safe) shared mutability should result in deterministic results, after all.

    However, allowing access to “immutable” global state might be a motivation to rule out interior mutability as a means to prevent mutation of global state. And allowing for parallel computations in pure functions could give rise to the desire for some “reorderability” properties of pure function calls, and those could in turn be in conflict with shared mutability. ↩︎

9 Likes

A undoubtedly pure function:

fn add(a: i32, b: i32) -> i32 {
    a + b
}

A definition-dependent pure function:

fn add(a: i32, b: i32) -> i32 {
    dbg!(a);
    dbg!(b);
    dbg!(a + b)
}

:wink:

How about &T? The answer there seems pretty obvious (the pointed object is the same), and I see no issue extending that to mutable refs. If we somehow decide to allow inspecting addresses, then you can add that the address is an input, and therefore must also be the same for the reference itself to be considered the same.

a const fn is pretty close to a pure function, at least for now.

It is extremely useful for optimizations. E.g. iterators adapters could optimize more aggressively if they knew that certain closures are pure.

Extending Rust's Effect System has a nice categorization of some relevant effect types.

2 Likes