I have a function that needs a block size. The block size can depend on a depth, but I want the application to be able to implement its own simple logic to map a depth to a block size, though the block size must never change for a specific depth.
In the simplest form, the depth-to-blocksize can just return a size. The most common block size logic we currently use is "64K if depth is 0, 4K otherwise", but more complicated variants are possible.
I take in a fn(u16) -> u32 function pointer (rather than an Fn) to discourage developers from making the block size depend on external variables captured in a closure.
In the end, this constraint will come down to a documentation issue, where if the developer decides to randomize the size for each call it'll trigger amazingly undefined behaviors, and all bets are off. With that said: What Rust tools are available to make it more difficult for developers to break this invariant?
I'm wondering if one can accomplish something in the spirit of:
(Asking specifically about using callback functions (or equivalent). I could easily have the application pass a precomputed data structure for doing the mapping, but I'm wondering specifically about what tools there are to constrain functions from introducing side effects).
What you are referring to is often called a "pure" function. The idea being that a function's outputs can be derived entirely from its inputs, independent of any external state.
The problem you are going to run into is that programs have access to a lot of external state. For example, I could store an AtomicU32 in a static variable and every time the function gets called, I'll increment that counter and use it to determine the block size.
All code has access to stdin, stdout, and the filesystem, so there are three easy ways for side-effects to creep in. Same with anything linking to an extern "C" function (this is what stdin/stdout/fs use under the hood to trigger side-effects).
If you want true determinism, compile your code to WebAssembly and make sure the WebAssembly module is instantiated from scratch every time (so static variables can't be persisted between calls) and make sure you don't give it access to non-deterministic functions (time, rng, fs, etc.).
Otherwise, if a dodgy callback can trigger UB in the rest of the program by crafting special outputs, you should mark your encode() function as unsafe to call and move on.
One approach not yet mentioned is to not prohibit side effects but to make them irrelevant: memoize the function. That is, call it at most once for any given input, and store that value, within whatever scope/context/structure cares about the consistency.
Of course, that means having storage for the memoization, and consulting it. (If the scope must be global, then a static[AtomicU32; 65536], perhaps?)
Wouldn't const check most of the requirements?
Another option could be to move this to compile time logic. There are a bunch of ways to do this, in the simplest form you could have a trait with associated constants and encode this information in there.
Instead of a function passed as an argument you would take some type T that implements the trait and just access the associated constants.
If this is a solution for you would depend on how "dynamic" the function must be.