Closures in API design – theoretical limitations and best practices?

Meta: Just to clarify once more, I'm interested both in the theoretical aspects of this question and I have particular problems with code I'm writing [1] (I'll get back to that in a sec). I would like to discuss both aspects, and I don't feel like it's (always) appropriate to open two or three threads on the same topic just to highlight different aspects of that topic. I do understand if not everyone likes to discuss everything, and in that case, I would like to emphasize that it's not necessary to engage in a theoretical discussion if one doesn't want to. There is no need to emphasize that a discussion is "useless", "always leads to the same result", etc. For my part, even having more than a year of experience with Rust, I still consider myself being a beginner, and I'm in the process of learning.

When I say that a language construct has an "unnecessary constraint" etc, then I do not (always) want to imply that the language should be changed. I merely share the observation that in certain contexts(!) a constraint seems rather unhandy than helpful, while knowing that in other cases it may be the opposite. I will try to emphasize this in the future to avoid misunderstandings. Sometimes, but not always, this can lead to bug reports or PRs. In the case discussed here, I did not want to imply that Rust's closures should transparently forward effects or that the language should be changed in any manner. I did, however, try to identify possible limitations that you may run across in everyday's programming that may be useful to keep in mind when programming idiomatic Rust.

In the future, I will also try to take more care in the future to give context for my posts, even if that may be redundant eventually, and to note when there is a shift in interest as a discussion goes on. I feel like a forum like this should offer space for both, and there is no need to tell other people what to discuss or not to discuss as long as it's not clearly out of scope. For now, I'm interested both in the theoretical and the practical aspects.

I apologize for any misunderstandings in the past.


That said, I would like to add a few more things to this topic.

The most recent issue I ran into (regarding closures) is this one: multivariate_optimization::Solver. I already tried to reduce the code that exhibits effects (async in that case) to two methods:

and to make the new method indifferent (generic) in that matter. I'm not sure if that was or is a good design.

Either way – now when actually using that code, I noticed that creating a specimen can fail (in my particular case it will involve running a seperate process, namely NEC2). So I would need to provide at least two more methods for extend_specimens and replace_worst_specimens (and maybe four, if I want to handle the fallible sync and async cases).

And if I care about Send and !Send futures, this will get worse.

So I thought on @kpreid's wording "loosely coupled" here:

In my case, I consider that the caller should be responsible for converting a specimen's parameters into the specimen, as shown here in the drafted documentation example:

let constructor = |params: Vec<f64>| {
    let cost = rastrigin(&params);
    BasicSpecimen { params, cost }
};
let into_specimens =
    |x: Vec<Vec<f64>>| x.into_par_iter().map(constructor).collect::<Vec<_>>();

// `into_specimens` may be replaced with another mechanism, e.g. an `async` one
// or, if no parallel computation is desired, one can simply use:
// let into_specimens = |x: Vec<Vec<f64>>| x.into_iter().map(constructor);

let mut solver = Solver::new(search_space);
solver.set_speed_factor(0.5);

let initial_params = solver.random_params(POPULATION);
solver.extend_specimens(into_specimens(initial_params));
/* … */

That is done in the into_specimens closure (which is actually a function as it doesn't capture any environment).

I would say that this approach is more "loosely coupled", but I'm not sure if I'm really happy with it, or what to do.


@steffahn I feel like duplicating all the methods like in ouroboros isn't a very nice way to go. (Though maybe macros could help out? Not sure if that's better though.) Maybe it's the current "best practice" though.

Looking at this from a language p.o.v.: Perhaps that's why "keyword generics" came up in the first place (even though they seem to be controversial, I guess)? What I also don't understand is if (or how) they are supposed to solve the effect of exceptions? That's not done via keywords in Rust but using Result or, in the more generic case with RFC #3058, with the yet unstable Try trait and the (already stable) std::ops::ControlFlow type.

In this context, I would also like to note that the solution in regard to async operations in closures passed to Iterator::map doesn't handle the Option case (opposed to the Result case) very well, because TryStreamExt is only implemented for TryStream, which is only implemented for for S where S: Stream<Item = Result<T, E>> + ?Sized. Thus the .ok_or() and .ok() workaround here:


Theoretical questions aside (though I appreciate any answers on that matter as well), what do I do in my concrete case in regard to the multivariate_optimization crate? Try "loosely coupling", or should I rather duplicate the methods and provide try_…_async, try_…_async_send, etc. variants? Any other option?

Side note, slightly off-topic …

… but maybe of interest: For the fun of it, I experimented with Monads in the past, though I would like to emphasize that I do not see those as a solution for this problem (or effect handling in general). It's been more a thought experiment. I ran into a couple of problems, like here and there, and even hanged the compiler (#113359). In either case, this is far from idiomatic Rust and I don't want to imply that there is any practical use for those experiments.


P.S./Meta:

I just noticed that I mixed up theory and my practical issue, disregarding your advice, in this post. :face_with_diagonal_mouth: Is that really such a big problem? (Not wanting to imply you said it was a big problem.) Should I have opened yet another (third) thread on this (i.e. the multivariate_optimiation issue)? I don't think that would be a good way to go. :man_shrugging:


  1. Actually this has happened more than once already, but I don't remember all the cases and/or scenarios. Maybe it didn't happen that often yet. I don't know/remember. One other recent case was the issue regarding regex that I linked in my OP, but that's not the one I'm thinking on at this moment. ↩︎

1 Like

One possible practical compromise is to offer two choices: one which is the above and one which is minimally flexible, just offering the Iterator::map()-style callback:

solver.create_specimens(constructor);

Then simple applications can have a simple interface, and ones which want some refinement can use the more complex, possibly fragile one. (Fragile in that it opens many questions of what happens if it is used differently than the intended pattern; possibly those have simple answers in the particular case.)

1 Like

Disclaimer: With this post, I want to explore the generic solution further. I do not want to imply this is idiomatic Rust, though it might be an interesting approach where boxing is too expensive. It might also be helpful to understand the value and/or limitations of certain unstable features better.

Do we really need pinning and boxing here? :innocent:

I tried to apply a technique I previously proposed for regex to the 4th toy example in my OP:

#![feature(
    closure_lifetime_binder,
    try_trait_v2,
    type_alias_impl_trait
)]

use std::future::Future;
use std::ops::{FromResidual, Try};

const KEYS: [&str; 2] = ["one", "two"];

mod internal {
    use std::future::Future;

    pub trait Callback<A, O>
    where
        Self: FnMut(A) -> <Self as Callback<A, O>>::Future,
    {
        type Future: Future<Output = O>;
    }

    impl<'a, C, F, A, O> Callback<A, O> for C
    where
        C: FnMut(A) -> F,
        F: Future<Output = O>,
    {
        type Future = F;
    }
}

pub struct UsesCallback {
    output: Vec<String>,
}

impl UsesCallback {
    pub async fn try_new_async<F, R1, R2>(mut callback: F) -> R2
    where
        F: for<'a> internal::Callback<&'a str, R1>,
        R1: Try<Output = String>,
        R2: Try<Output = Self> + FromResidual<<R1 as Try>::Residual>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key).await?);
        }
        R2::from_output(Self { output })
    }

    pub fn into_inner(self) -> Vec<String> {
        self.output
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    type R<'a> = impl 'a
        + Future<Output = Result<String, Box<dyn std::error::Error>>>;
    let uses_async_fallible_callback =
        UsesCallback::try_new_async::<
            _,
            _,
            Result<_, Box<dyn std::error::Error>>,
        >(for<'a> |s: &'a str| -> R<'a> {
            async move {
                let first_char_option: Option<char> = s.chars().next();

                // Using `await` is possible here:
                tokio::task::yield_now().await;

                // Returning an error is possible here:
                let first_char: char =
                    first_char_option.ok_or("empty string")?;

                Ok::<_, Box<dyn std::error::Error>>(format!(
                    "{s} begins with '{first_char}'"
                ))
            }
        })
        .await?;
    assert_eq!(
        uses_async_fallible_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
    Ok(())
}

(Playground) (Playground) (Playground)
edit #1: renamed Callback::Output to Callback::Future to avoid confusion with Future::Output.
edit #2: defined internal::Callback more generic, which makes the code easier to read, in my opinion.

As we can see, using BoxFuture wasn't neccessary after all! :astonished:

I wonder: Has this technique been used and/or described previously?

It does avoid boxing for the sole purpose of specifying the lifetime. But it comes at a high "syntactic price". I don't want to imply it's a good idea to do this (in most cases), but maybe it could be interesting to investigate this further, in order to get rid of BoxFuture (in the future :wink:) where it's simply used to do the lifetime handling (and is not really needed).

Technical question: Am I correct that this approach is Send/!Send agnostic?

The next step (if we truly want a generic solution here) would be to consider this:

I tried to be generic regarding fallibility as well by using Try and FromResidual (instead of Result) in combination with avoiding boxing the future. But the syntax complexity exploded in such a way that I couldn't solve it (yet). Update: Solution below:

#![feature(
    closure_lifetime_binder,
    try_trait_v2,
    type_alias_impl_trait
)]

use std::future::Future;
use std::ops::{FromResidual, Try};

const KEYS: [&str; 2] = ["one", "two"];

mod internal {
    use std::future::Future;

    pub trait Callback<A, O>
    where
        Self: FnMut(A) -> <Self as Callback<A, O>>::Future,
    {
        type Future: Future<Output = O>;
    }

    impl<'a, C, F, A, O> Callback<A, O> for C
    where
        C: FnMut(A) -> F,
        F: Future<Output = O>,
    {
        type Future = F;
    }
}

pub struct UsesCallback {
    output: Vec<String>,
}

impl UsesCallback {
    pub async fn try_new_async<F, R1, R2>(mut callback: F) -> R2
    where
        F: for<'a> internal::Callback<&'a str, R1>,
        R1: Try<Output = String>,
        R2: Try<Output = Self> + FromResidual<<R1 as Try>::Residual>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key).await?);
        }
        R2::from_output(Self { output })
    }

    pub fn into_inner(self) -> Vec<String> {
        self.output
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    type R<'a> = impl 'a
        + Future<Output = Result<String, Box<dyn std::error::Error>>>;
    let uses_async_fallible_callback =
        UsesCallback::try_new_async::<
            _,
            _,
            Result<_, Box<dyn std::error::Error>>,
        >(for<'a> |s: &'a str| -> R<'a> {
            async move {
                let first_char_option: Option<char> = s.chars().next();

                // Using `await` is possible here:
                tokio::task::yield_now().await;

                // Returning an error is possible here:
                let first_char: char =
                    first_char_option.ok_or("empty string")?;

                Ok::<_, Box<dyn std::error::Error>>(format!(
                    "{s} begins with '{first_char}'"
                ))
            }
        })
        .await?;
    assert_eq!(
        uses_async_fallible_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
    Ok(())
}

(Playground)
All Playgrounds tested successfully with 1.73.0-nightly (2023-07-29 32303b219d4dffa447aa).

For obvious reasons, type inference can't handle this anymore, so there is an extra type annotation needed in main to specify which Try type we expect as the Output of the Future returned by try_new_async.

Actually the possibilities in Koka inspired me to try to find this generic solution above (many thanks for pointing me to that interesting language). My idea was that the part that does not require continuations can be handled by Try/FromResidual. The continuations [1] need to be handled by Futures/async though. As shown above, it seems to be possible to combine these in Rust, but it gets pretty ugly.

If I understand right, then my most recent variant of try_new_async is generic in regard to the Futures used (i.e. the Future can be Send or !Send and its type doesn't need to be named) and it is also generic in regard to the ControlFlow mechanism to handle an early exit.


  1. Actually we don't have real (or "arbitrary") continuations in Rust as the unnameable Futures won't be Clone + Unpin. See also this post above by @kpreid. ↩︎

I’ve come across the observation that usage of BoxFuture is a workaround solution to get async callbacks on stable Rust, multiple times, with the implications that the boxing and type erasure of course ought to not be necessary. Your code demonstrates the need for unstable features to make this work quite well, otherwise the compiler tends to complain when the callback is anything but a simple async fn. See e.g. here.

It’s interesting to see your code though; I’ve never experimented with approaches to make passing a (possibly capturing) closure work with unstable features, as you do in annotating the closure signature using type-alias-impl-trait.


Indeed avoiding the type erasure should also make it handle auto traits.

2 Likes

That looks pretty similar (or even almost identical?) to what I proposed in the above linked PR draft #1048 for regex: You make Fn(Mut/Once) a supertrait of an own trait with a blanket implementation. (Though you applied it to Futures already.)

Note that the linked PR draft does not require unstable features (you can use a coercing function to specify the closure's type). Maybe that could help getting rid of the TAIT (type-alias-impl-trait) in the above Playground too, but I haven't tested that.

:smiley:

I think, I don’t see how that helps much, as the coerce function relies on being able to provide a more concrete Fn…(…) -> … bound that expresses the true signature of the closure to help out type inference. Whereas with the case of returning a future, you can never properly write the returned Future type, so writing a good signature for coerce is just as problematic (unless it’s a nameable future type, which is the whole point of using the BoxFuture workaround; or, apparently, as you demonstrated, unless TAIT is used).

You are probably right. I didn't fully understand your reasoning yet but I noticed that a coercing function probably wouldn't do anything else than the bound in the already existing method.

FWIW, it's possible to get rid of the closure_lifetime_binder feature though. (Playground)

1 Like

We can work around fix this by using the unstable feature try_trait_v2_residual (#91285) and by forcing the "fallibility wrapper" (e.g. Result, Option, etc.) to be identical to the one used by the Future::Output of the return type of the closure:

    pub async fn try_new_async<F, R>(
        mut callback: F,
    ) -> <R::Residual as Residual<Self>>::TryType
    where
        F: for<'a> internal::Callback<&'a str, R>,
        R: Try<Output = String>,
        R::Residual: Residual<Self>,
        <R::Residual as Residual<Self>>::TryType:
            Try<Output = Self> + FromResidual<<R as Try>::Residual>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key).await?);
        }
        <R::Residual as Residual<Self>>::TryType::from_output(Self {
            output,
        })
    }

(Playground) (Playground) edit: Also got rid of unnecessary turbofish type annotation in main.

But… well…

The goal of my OP was to better understand the theoretical limitations of Rust's closures in API design as well as finding best practices to work and live with these limitations. After having shown in my previous post how a callback-based API could theoretically be modeled with (unstable) Rust such that it allows for fallibility and async, I would like to try to describe the nature of callbacks a bit better. I will attempt to do this to get a better understanding of the limitations first, and maybe (hopefully?) figure out best practices to deal with it.

Callbacks seem to be used when some code A initially calls some code B, and B wants or needs to have power over the control flow regarding parts of A.

This isn't just about (side-)effects: As Rust is a non-functional language, all functions can have side-effects, either because they are FnMut or because they can resort to inner mutation (e.g. using Arc<Mutex<_>>). And every function may also have I/O effects. But the interesting effects are related to control flow, where we can observe the following ones:

  • Aborting: Using callbacks or not, this is always possible, e.g. by panicking when there is no catch_unwind or by using the unstable always_abort (or by panicking in a drop handler, etc).
  • Diverging (non-termination): As we're touring complete, this seems to also be always possible in Rust.
  • Early-exit (exceptions): If we want to exit early (possibly beyond a single function body), we have two ways to do this in Rust:
  • Yielding (async, coroutines, generators, etc.): This gets modeled by returning a Future instead of the desired output value.
  • Arbitrary continuations: These don't exist in Rust. So nothing to worry about here.

Now there is potential friction when the code in A wants to use some of these effects while code of B is in the way on the stack.

  • Aborting and diverging doesn't cause any problems.
  • Early-exit can work in two ways. Using catch_unwind is usually not a viable option though. Instead, the API needs to wrap return values in a Result or some other Try type. Usually, a Result will be used, but this isn't generic! As shown in my previous post, the generic (but possibly syntactically horrible) way to express this would be to use the Try and Residual traits.
  • Yielding requires all involved functions and/or closures to return a Future, which is in many cases automatically achieved through the async keyword. Where we can't use an async fn but must use an async block expression, we can run into problems because the created Futures are unnameable except if they are boxed. This makes it hard (but not impossible, as also shown in the previous post) to capture a lifetime in the future's type. In most everyday code, we would use BoxFuture or LocalBoxFuture, which forces us to decide on whether we want to support (and demand) Futures that implement Send.

So much for the theory. But what does that mean in practice?

Unfortunately [1], I think it will depend on a case-by-case basis. Where we don't really need the control flow management, we could try a "loosely coupled" approach by avoiding callbacks overall or only providing a callback-based API for the most simple cases (without fallibility or async support). This is basically what regex does and what @kpreid suggested here.

But sometimes the functionality that a crate provides actually is about the control flow. I.e. the main point of a crate is to implement algorithms that cause a certain complex control flow. In this cases, I see four five options:

  1. Provide limited support for fallible and async: E.g. support none or only one of those; or only support Send but not !Send futures. This keeps API complexity low but also limits the capabilities of the API. Where necessary, bridges between sync and async (as already mentioned by @CAD97 here) can be employed. This is more tricky in regard to fallibility: Here we could use catch_unwind, but I feel like this is somewhat painful.
  2. Duplicate methods: For example, provide a foo, try_foo, foo_async, try_foo_async, foo_async_send, try_foo_async_send method, instead of just one foo method. I feel like in practice, we won't be generic in regard to the Try types and just use Result. This means that if we use some other type than Result, e.g. Option, we must work around that, as previously shown here.
  3. Offer copy&paste-able code: This is also what regex does here.
  4. Use macros: We could consider to use macros instead of functions/methods to offer certains forms of control flow. It's not elegant but might be the least pain in many cases.

Edit: I forgot [2]:

  1. Be generic (not considered as idiomatic in many cases, but I want to mention it at least): One could omit a foo method in the first place and only offer a (most generic) try_foo_async(_send) method. Again, bridging between sync and async is possible (though it requires an async executor, e.g. futures::executor::block_on even if you don't want to be async at all). Wrapping an infallible result T in a Result<T, Infallible> should be relatively easy. Most tricky, I believe, would be the Send/!Send property of futures: one could either only support one of these (likely the Send variant), duplicate the API here, or use unstable features (TAITs, which seem to be needed, as pointed out before) to work around that. But being generic will still reduce ergonomics in many cases. Edit #2: Also, using async-only isn't truly generic, as it will come with a runtime overhead for the (bridged) sync case (which may or may not be solved with keyword generics in the future).

TL;DR

It depends.


  1. I wished this wasn't the case and there would be an elegant way to generically handle all effects (including control flow) in Rust, but I'm not optimistic it will happen or even could happen without breaking a lot of things or making things even more awkward. I feel like the existing solutions in Rust are too complex already to be well-understood by the majority of programmers (or to be used by advanced programmers without creating syntactically horrible code). Adding things like Residual or keyword generics might make things even worse. Or not? I'm undecided. But consider how much headache Pin can cause already, e.g. in regard to projections. ↩︎

  2. How could I forget!? :sweat_smile: ↩︎

4 Likes

According to the announcement post of the Keyword Generics Initiative, fallibility is on their mind as well, and could result in new try or throws keywords. Unfortunately, keyword generics seem to be a long way off, if they ever come to pass.

const is another one, and one of the motivations for the keyword generics initiative. Interestingly, it is more a "subtraction" of capabilities offered by the base set of rust, rather than an addition like async is. You might be interested in this blog post by Yoshua Wuyts, who us part of the initiative.

PS: I for one always enjoy to read these lengthy discussions veering into the theoretical. There is definitely no reason to respond negatively for sub-optimal examples of legitimate problems in the language...

1 Like

For this point, if the async only exists for the callback and the callback doesn't use the asynchrony, the appropriate way to discharge the asynchrony would be .now_or_never(), which just polls a single time without even setting up the waker infrastructure. And perhaps .poll_once().now_or_never() if you want to avoid previously discarding an unfinished task if it happens to await.

3 Likes

Okay, thanks, that is interesting to know.

Speaking of keywords there is also the unsafe keyword which may be modeled as an effect too! The effect of "potential UB". :grin:

Content warning: modeling Rust's unsafe in Koka

Just for demonstration, we could model Rust's idea of "unsafe" as an effect danger in Koka:

effect danger 
  val danger-dummy : () // needed for syntactic reasons only

Then we could also implement an equivalent of Rust's unsafe { … } blocks like:

fun i-know-what-i-am-doing( action : () -> <danger|e> t ) : e t
  with val danger-dummy = ()
  action()

Finally we could define an unsafe fn like this (don't worry, I'll get back to Rust code in a sec):

fun foo( x : t ) : danger t
  x

Here, writing danger t instead of t corresponds to marking the function as unsafe in Rust.

And now we can write a main function in Koka, that looks like this:

fun main()
  with i-know-what-i-am-doing
  val u = [1, 2, 3];
  val v = map(u, fn(x) foo(2 * x))
  println(show-list(v, show))

Here, with i-know-what-i-am-doing corresponds to Rust's unsafe { … }. If we omit it, then the program won't compile.

We can see that Koka allows calling the unsafe/dangerous function in the closure passed to map. Thus map will "forward" the handling of unsafe/danger. It acts transparent in that matter.


Okay, so much about Koka, let's get back to Rust.


Interestingly, the previously cited announcement post regarding "keyword generics" speaks about unsafe Rust in a side note at the end:

We sometimes joke that Rust is actually 3-5 languages in a trenchcoat. Between const rust, fallible rust, async rust, unsafe rust - it can be easy for common APIs to only be available in one variant of the language, but not in others.

Do closure boundaries act transparent in regard to unsafe? Surprisingly (or unsurprisingly?) they do:

/// # Safety
///
/// * You need to know what you're doing.
pub unsafe fn foo<T>(x: T) -> T { x }

fn main() {
    // SAFETY: I know what I'm doing.
    let u: Vec<i32> = unsafe {
        let v = foo(vec![1, 2, 3]);
        v.into_iter().map(|x| {
            foo(2 * x) // no extra `unsafe` keyword needed here
        }).collect()
    };
    assert_eq!(u, vec![2, 4, 6]);
}

Edit: Admittingly, this ony works because unsafe { … } simply operates on the syntactic level, and not because map or closures do something magic in regard to effects here.

Just to remember, we won't be able to use .await across such a boundary:

pub fn unasync<T>(x: T) -> T::Output
where
    T: std::future::Future,
{
    futures::future::FutureExt::now_or_never(x).unwrap()
}

/// This method is `async`.
pub async fn bar<T>(x: T) -> T { x }

fn main() {
    let a: Vec<i32> = unasync(async {
        let v = bar(vec![1, 2, 3]).await;
        v.into_iter().map(|x| {
            // This does not work:
            bar(2 * x).await
        }).collect()
    });
    assert_eq!(a, vec![2, 4, 6]);
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0728]: `await` is only allowed inside `async` functions and blocks
  --> src/main.rs:16:24
   |
14 |         v.into_iter().map(|x| {
   |                           --- this is not `async`
15 |             // This does not work:
16 |             bar(2 * x).await
   |                        ^^^^^ only allowed inside `async` functions and blocks

For more information about this error, try `rustc --explain E0728`.
error: could not compile `playground` (bin "playground") due to previous error

To solve this, we need something like block_on (or in this case something that uses now_or_never to "unasync" an inner async block) inside the closure. (Full Playground with both examples).

So summarizing, unsafe could be considered to be an effect as well. This is merely a theoretic observation though without any real life impact: As shown in the above Playground, Rust acts different regarding unsafe when compared to async or fallibility (Result<T, E>, Try, etc). But I thought it is curious nonetheless.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.