I believe there are many different possible subtle flavors of “pureness” concepts; the basic idea is often that pure functions must resemble mathematical functions in some manner, specifically that the output only depend on the inputs. Additionally, there’s often the desire to have additional limitations on “side effects” in general.
The devil is in the details though.
I’m pretty familiar with Haskell’s notions of pureness, which helps a bit, but also Rust is quite different in many ways.
For example Rust’s types can have ownership. With a non-Copy
type, it’s less trivial to ask the meaning of “function return values are identical for identical arguments” – though to be fair, the question of what return values count as “identical” is nontrivial in any case.
Also, memory allocations are much more deeply integrated with the language in Haskell than Rust. The former is garbage-collected and every data structure implicitly boxed, so of course “pure” functions can make full use of allocations in Haskell, but it can be a discussion point for any notion of “pureness” in Rust.
The design of “pureness” system also relates to its goals&use-case. One often-quoted usage is “compiler optimization” so you can e.g. eliminate repeated calls to the same function. More freedom for reordering evaluation could also be a goal. IMO most of these ideas are not a great motivation for a language-wide pureness system though.
Many people are interested in pureness for easier verification of correctness. It’s a way to restrict mutability, which restricts shared mutability. Rust already is good at restricting shared mutability; comparable to functional programming languages that are only “pure by convention”, i.e. highly favor purely functional APIs, but still don’t explicitly mark impure functions after all.
Rust is also different though in that it also embraces a strict type system and encoding much useful information in function signatures already; so desire for some form of annotating “effects” that are currently hidden from function signatures comes up regularly, unsurprisingly.
One use-case of “pure” functions that I’ve seen at least a few times was around unsafe
code. Sometimes it’s just very useful to have a guarantee that – say – some user-defined fn(usize) -> (usize)
behaves like a mathematical function.
Notably Haskell’s notion of pureness works similarly to safe code in Rust. You can use “unsafe” API to execute impure code in a pure context, and this can be useful in order to represent an overall-pure operation that relies on (and properly encapsulares) impure implementation details, e.g. for performance reasons.
So let’s outline how to begin a pureness system. This should be somewhat familiar to people who know Rust’s existing notions of safety and soundness already.
We would start by outlining some minimal goals; e.g. clearly
- a pure function with signature
fn() -> bool
either always return true or always return false
- if all you do in a function is “compose” other pure operation, then the overall result is pure
next, we can expand the system in two main ways:
- Declare more fundamental operations as pure
- Define more properties that we expect from pure functions
That’s where the design decisions will eventually come into play. The property that composing pure functions makes more pure functions means we can think about “soundness” again. We can only define a set of fundamental operations as pure that’s overall sound, i.e. can’t be used to define other pure functions that violate the stated property about fn() -> bool
, or any additional properties that we might have defined.
Just like it already sometimes happens in Rust, it can be the case that one operation and another operation both can be sound individually, but no longer if taken together:
For example, one could consider Box::<u8>::new
a pure operation, or inspecting the address of a box could considered pure; but when both are taken together, one could make a pure function that generates an address (usize
) value which no longer depends on the function inputs..
The decision here is usually arbitrary. One factor to keep in mind though can be the goal to “make most functions pure
” – with that in mind one would likely rather allow allocation and deallocation in pure functions and disallow inspection of pointer addresses, because the former is far more common (and the latter is, if at all, often done in unproblematic manners which could then unsafe
ly be marked to be pure, anyways).
Similar considerations would apply to &mut T
-usage. We could think about going very far in the additional “properties that we expect from pure functions” that we define. For instance, if we want to express in some way that with a pure function f
, calling f(x)
twice should return the same result, we could so far as requiring the same thing even for x: &mut T
. Of course then, a function f
that actually modifies the T
behind the reference can no longer be pure; two calls to a fn(&mut T) -> bool
function that inspects and modifies the T
will easily return different results.
In my opinion however, it would be much more useful to consider how similar say fn(&mut T) -> bool
is to fn(T) -> (T, bool)
, and allow mutation through explicitly passed unique references. The intuition that “function return values are identical for identical arguments” can still apply to functions that don’t handle arguments containing mutable references. fn(&mut T) -> bool
isn’t directly reflecting the signature of a “mathematical function” but that’s not actually problematic.
That being said, most forms of interior mutability / shared mutability are likely too hard to encapsulate properly and can’t be allowed in pure functions.
Even global state modification can be quite desirable in pure functions, if properly encapsulated. For example a pure function might want to make use of a lazy global regex; the first time it’s called, it would also compiler the regex, which does update global state. As long as no-one can “notice” this change, there should be no issue with it.
Most importantly of course, other pure functions must not be able to notice any of the “not-quite-side-effects” that a pure function is allowed to have, otherwise you would be able to define an impure fn() -> bool
.
But additional restrictions on side-effects are desirable, too. For instance… of course functions can’t read files, because if you read a file twice, the contents may change; but creating/writing a file won’t be an observable effect to other pure functions. Nonetheless, accessing the file system seems so clearly impure that it’s likely something a “pure” function should not be allowed to do.