Forbidding side effects (as much as possible)

I have a function that needs a block size. The block size can depend on a depth, but I want the application to be able to implement its own simple logic to map a depth to a block size, though the block size must never change for a specific depth.

In the simplest form, the depth-to-blocksize can just return a size. The most common block size logic we currently use is "64K if depth is 0, 4K otherwise", but more complicated variants are possible.

I take in a fn(u16) -> u32 function pointer (rather than an Fn) to discourage developers from making the block size depend on external variables captured in a closure.

In the end, this constraint will come down to a documentation issue, where if the developer decides to randomize the size for each call it'll trigger amazingly undefined behaviors, and all bets are off. With that said: What Rust tools are available to make it more difficult for developers to break this invariant?

I'm wondering if one can accomplish something in the spirit of:

#[disallow_external_fn_variables]
pub type CalcBS = fn(u16) -> u32;

pub fn encode(.., get_block_size: CalcBS, ..) {
}

(Asking specifically about using callback functions (or equivalent). I could easily have the application pass a precomputed data structure for doing the mapping, but I'm wondering specifically about what tools there are to constrain functions from introducing side effects).

1 Like

You can't do this.

Well, I guess you could make the function unsafe and say "if the function is not deterministic, its UB", which makes it the caller's fault.

4 Likes

What you are referring to is often called a "pure" function. The idea being that a function's outputs can be derived entirely from its inputs, independent of any external state.

The problem you are going to run into is that programs have access to a lot of external state. For example, I could store an AtomicU32 in a static variable and every time the function gets called, I'll increment that counter and use it to determine the block size.

All code has access to stdin, stdout, and the filesystem, so there are three easy ways for side-effects to creep in. Same with anything linking to an extern "C" function (this is what stdin/stdout/fs use under the hood to trigger side-effects).

If you want true determinism, compile your code to WebAssembly and make sure the WebAssembly module is instantiated from scratch every time (so static variables can't be persisted between calls) and make sure you don't give it access to non-deterministic functions (time, rng, fs, etc.).

Otherwise, if a dodgy callback can trigger UB in the rest of the program by crafting special outputs, you should mark your encode() function as unsafe to call and move on.

4 Likes

One approach not yet mentioned is to not prohibit side effects but to make them irrelevant: memoize the function. That is, call it at most once for any given input, and store that value, within whatever scope/context/structure cares about the consistency.

Of course, that means having storage for the memoization, and consulting it. (If the scope must be global, then a static [AtomicU32; 65536], perhaps?)

5 Likes

Wouldn't const check most of the requirements?
Another option could be to move this to compile time logic. There are a bunch of ways to do this, in the simplest form you could have a trait with associated constants and encode this information in there.
Instead of a function passed as an argument you would take some type T that implements the trait and just access the associated constants.
If this is a solution for you would depend on how "dynamic" the function must be.

2 Likes

Doesn't const fn forbid all side effects? It's also not possible to call any non-const fn from a const fn.

Const doesn't help because

error: an `fn` pointer type cannot be `const`
 --> src/lib.rs:2:12
  |
2 | type Foo = const fn() -> i32;
  |            -----^^^^^^^^^^^^
  |            |
  |            `const` because of this
  |            help: remove the `const` qualifier
2 Likes

What's the cap on depth? You could have a

const MAX_DEPTH: usize = 42;
trait Depth {
    const BLOCK_SIZE: [u32; MAX_DEPTH];
}

If the depth is relatively limited.

2 Likes

Not sure if this helps in the particular case, but I was experimenting a bit with passing constant functions:

// Receiving an `u32` and returning an `u32`
trait ConstFn<const X: u32> {
    const Y: u32;
}

struct Double;
impl<const X: u32> ConstFn<X> for Double {
    const Y: u32 = 2 * X;
}

struct Triple;
impl<const X: u32> ConstFn<X> for Triple {
    const Y: u32 = 3 * X;
}

// Takes a constant `u32` and some sort of "constant function" that goes from `u32` to `u32`
fn foo<const X: u32, F: ConstFn<X>>() {
    println!("Received {} and calculated {}.", X, F::Y);
}

fn main() {
    foo::<100, Double>();
    foo::<5, Triple>();
}

(Playground)

Output:

Received 100 and calculated 200.
Received 5 and calculated 15.

I think this approach is limited though and doesn't work with dyn.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.