Closures in API design – theoretical limitations and best practices?

Continuing the discussion from Fallible replace_all for regular expressions:

I'm unsure when using closures in a Rust API is a good choice or not. I don't have a concrete problem right now, but I've repeatedly stumbled upon a few cases like this one regarding the regex crate but also some cases in my own crates (e.g. that one but also some other proprietary ones) where using closures caused or would cause some limitations, in particular:

unless certain measures are taken, e.g. the closure wrapping its return value in a Result or BoxFuture.

I also expect to run into these issues again in the future.

To get away from a particular use case and to be able to discuss this issue in a more general fashion, let me consider the following toy example (Playground for this and all following examples):

const KEYS: [&str; 2] = ["one", "two"];

pub struct UsesCallback {
    output: Vec<String>,
}

impl UsesCallback {
    pub fn new<F: FnMut(&str) -> String>(callback: F) -> Self {
        let output = KEYS.iter().copied().map(callback).collect();
        Self { output }
    }
    pub fn into_inner(self) -> Vec<String> {
        self.output
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let uses_callback = UsesCallback::new(|s| {
        let first_char_option: Option<char> = s.chars().next();

        // We can't `await` here.
        //tokio::task::yield_now().await;

        // We can't return an error here:
        //let first_char: char = first_char_option.ok_or("empty string")?;

        // We must panic on error:
        let first_char: char = first_char_option.unwrap();

        format!("{s} begins with '{first_char}'")
    });
    assert_eq!(
        uses_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
}

Here, the UsesClosure::new function expects a closure to map a &str to a String. It's not possible to use await or return an error there.

We could allow async by providing a new_async method and using pinned, boxed futures:

use futures::future::BoxFuture;

impl UsesCallback {
    pub async fn new_async<F>(mut callback: F) -> Self
    where
        F: FnMut(&str) -> BoxFuture<String>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key).await);
        }
        Self { output }
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let uses_async_callback = UsesCallback::new_async(|s| {
        Box::pin(async move {
            let first_char_option: Option<char> = s.chars().next();

            // Using `await` is possible here:
            tokio::task::yield_now().await;

            // We can't return an error here:
            //let first_char: char = first_char_option.ok_or("empty string")?;

            // We must panic on error:
            let first_char: char = first_char_option.unwrap();

            format!("{s} begins with '{first_char}'")
        })
    })
    .await;
    assert_eq!(
        uses_async_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
}

As we can see, failing (other than panicking) is still not possible here. If we need that, we could instead do:

impl UsesCallback {
    pub fn try_new<F, E>(mut callback: F) -> Result<Self, E>
    where
        F: FnMut(&str) -> Result<String, E>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key)?);
        }
        Ok(Self { output })
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let uses_fallible_callback = UsesCallback::try_new(|s| {
        let first_char_option: Option<char> = s.chars().next();

        // We can't `await` here.
        //tokio::task::yield_now().await;

        // Returning an error is possible here:
        let first_char: char =
            first_char_option.ok_or("empty string")?;

        Ok::<_, Box<dyn std::error::Error>>(format!(
            "{s} begins with '{first_char}'"
        ))
    })?;
    assert_eq!(
        uses_fallible_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
}

Finally, if we want to be able to do both, we'd have to do something like:

impl UsesCallback {
    pub async fn try_new_async<F, E>(mut callback: F) -> Result<Self, E>
    where
        F: FnMut(&str) -> BoxFuture<Result<String, E>>,
    {
        let mut output = Vec::with_capacity(KEYS.len());
        for key in KEYS.iter().copied() {
            output.push(callback(key).await?);
        }
        Ok(Self { output })
    }
}

    let uses_async_fallible_callback =
        UsesCallback::try_new_async(|s| {
            Box::pin(async move {
                let first_char_option: Option<char> = s.chars().next();

                // Using `await` is possible here:
                tokio::task::yield_now().await;

                // Returning an error is possible here:
                let first_char: char =
                    first_char_option.ok_or("empty string")?;

                Ok::<_, Box<dyn std::error::Error>>(format!(
                    "{s} begins with '{first_char}'"
                ))
            })
        })
        .await?;
    assert_eq!(
        uses_async_fallible_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
}

Is this idiomatic yet?

Maybe it's better to avoid closures here entirely, and try something like the following?

use std::cell::RefCell;

pub struct UsesIterator {
    output: Vec<String>,
}

impl UsesIterator {
    pub fn into_inner(self) -> Vec<String> {
        self.output
    }
}
    
pub struct UsesIteratorBuilder {
    output: RefCell<Vec<String>>,
}

impl UsesIteratorBuilder {
    pub fn new() -> Self {
        Self {
            output: RefCell::new(Vec::with_capacity(KEYS.len())),
        }
    }
    pub fn iter(&self) -> impl Iterator<Item = &str> {
        KEYS.iter().copied()
    }
    pub fn push(&self, s: String) {
        let mut output = self.output.borrow_mut();
        if !output.len() < KEYS.len() {
            panic!("pushed too many");
        }
        output.push(s);
    }
    pub fn build(self) -> UsesIterator {
        let output = self.output.into_inner();
        if output.len() != KEYS.len() {
            panic!("pushed too few");
        }
        UsesIterator { output }
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let builder = UsesIteratorBuilder::new();
    for s in builder.iter() {
        let first_char_option: Option<char> = s.chars().next();

        // Using `await` is possible here:
        tokio::task::yield_now().await;

        // Returning an error is possible here:
        let first_char: char =
            first_char_option.ok_or("empty string")?;

        builder.push(format!("{s} begins with '{first_char}'"));
    }
    let uses_iterator = builder.build();
    assert_eq!(
        uses_iterator.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
    Ok(())
}

(Playground for all examples in this post)

Not sure if the last example could be written cleaner, without using a RefCell.

Some questions:

  • What are your experiences with these problems?
  • Do you try to avoid closures for these (or other) reasons in your APIs?
  • Do you often end up duplicating code for async and non-async interfaces?
  • Do you often end up duplicating code for fallible and infallible interfaces?
  • Have you experienced trouble with using third party APIs that don't take the async or fallible cases into account?
  • Is it idiomatic to work around closures and try to program less functional-style and more procedural-style in order to not get into trouble with ? and async? E.g. by using Vec::with_capacity and a for loop that pushes items instead of using functional-style mapping? Should closures perhaps even be avoided in API design (unless you're sure they cover all use cases)?
  • What's current best practice in regard to considering async or fallible code when designing an API?
  • What are the future prospects in this matter?
3 Likes

My take: The problem here is not specific to closures; rather it is a general problem with what I'll call callbacks.

Suppose A and B are functions/types/modules/crates where A makes use of functions defined by B. Then a callback (for purposes of this discussion) is any time one of B's functions calls a function that was provided to it by A — the stack contains A's code, then B's code, then A's code (or some asynchronous analogue to this). The callback is not necessarily of a Fn or fn type — it can also be any trait that B obligates A to implement.

This comes up less often with trait callbacks because those often (but not always) have some more narrowly-defined purpose than “a function that does whatever A wants”, but it still can.


Your alternative UsesIterator solves these problems by not taking any callbacks — B never calls A's code (unless A opts in by saying UsesIteratorBuilder::iter(...).for_each(|key| ...)). This solution, when applicable, is a good one; it is more “loosely coupled” than any of the others. Of course, in your specific case, it comes with the downside of requiring a specific pattern of push, but that could be improved on by e.g. replacing .push(s) with .set(key, s) — now the builder is more flexible (values can be provided in any order!) and the contract can be explained as "all keys must be given values".

Something I see as closely related is the ways input/output code can be inflexible. I've often seen situations (more in other languages than Rust, so far) where code is unsuitable for reuse because it bakes in IO — for example, any Rust code that uses std::{io, fs} generally couldn't be used in web-Wasm even if std weren't stubbed out because the set of IO operations that is possible is wildly different (e.g. there is absolutely no blocking). And yet, it might be the case that the algorithms that are performed (protocol implementation) are still useful. Therefore, it is useful to follow the “sans-IO” paradigm — write a library or module which contains the algorithms for implementing the protocol, but perform no IO operations, and take no callbacks that are expected to complete an IO operation before they return. Express the state of e.g. a protocol parser as a state machine, not a sequential functions. The sans-IO library can then be used in any paradigm (blocking, nonblocking, async, …) and platform (POSIX, web, …) without needing to be rewritten.


Some of these problems may be addressed by Rust's plans for “keyword generics” — being able to write a function that can be used as async or not. I'm somewhat skeptical that will work or be a good idea[1], and even when it is available I will probably be inclined to write callback-free code rather than generic code, whenever that is a reasonable option, because callback-free code is simpler and more robust — B doesn't have to think about the side-effects of calling A, or the proper function signature to offer to A, if it doesn't ever call A. These issues are not Rust-specific.

I'm not saying that you should never use callbacks; rather, that they should be used sparingly like all other sources of complexity and constraint.


  1. for example, because programmers may add such generics without thinking about the consequences, such as introducing async cancellability and migration between threads ↩︎

11 Likes

I wasn't aware of that, interesting!

I wonder if it's really a matter of side-effects (still trying to understand this). Afterall, we're not in a functional programming language, so side-effects are always possible:

fn twice<F: FnMut()>(mut f: F) {
    f();
    f();
}

fn main() {
    let mut i = 0;
    twice(|| {
        println!("Hello, world!"); // side effect
        i += 1; // side effect
    });
    assert_eq!(i, 2);
}

(Playground)

So I wonder: what is it that's making problems with callbacks? Is it implementing control flow in "user code" (such as running a code twice, or aborting it, etc.)? Or is it something else? edit: Sorry, this is a bit confusing, considering the function above is executing code twice.

I would like to know the correct term here, so I can even name this "side-effect-y thing" (that includes falliblity and async) whatever it is.

I feel like keyword generics are a concrete solution for a more generic problem (that might backfire when encountering other things than async, e.g. fallibility). But how generic does this problem get? Are there any other cases beside falliblity and async? Well, maybe async comes in two flavors already: Sendable futures and futures that are !Send. We encounter these flavors, for example, in the async_trait crate.

As shown above, "side-effects" in general are covered by simple callbacks/closures already (at least in case of FnMut closures or &mut self methods; but even for Fn closures or &self methods, it's possible to use inner mutability). So what is "not covered" by callbacks unless explicitly forseen?

Would/could it make sense to create a crate that's covering the "common" case of async and Result in a single wrapper? Is it possible to do that even generically for any form of control flow? I feel like in practice, the syntax wouldn't be handy, and the resulting code would not be well readable. But I'm curious anyway.

Yes, side effects are always possible no matter what (a closure or trait impl might contain a channel it can write to), if the code that performs them runs. But also, I had in mind (and perhaps should have said more explicitly) a very broad sense of "side effects" including things like unwinding (whether via early return or panic) and infinite looping. My point is that if B never calls A, then A can do none of those things — thus B only has to think about its own invariants — there is never a question of, say, B getting into a bad state because A panicked and unwound out of B's code (which B should handle as needed, but programmers make mistakes).

There are also side-effect-related considerations like: if B calls A several times, then A can observe the ordering of those calls and end up doing something different if the ordering changes. Thus, a change to the algorithm B uses internally might affect A's results in a way neither anticipated.

All these issues go away if B never calls A — if any call to B's functions only runs B's code and then returns. It's a very powerful principle for keeping things simple, modular, and robust, when applicable.

1 Like

So given the very first example of infallible, non-async closures, A would be main, including the closure (callback) it provides, and B would be UsesCallback and its methods.

In the toy example, code both from A and B runs. In each iteration, we have them on the stack like: A, B, A.

Using the very first example (infallible, non-async closure in the OP), let's look at the three effects you mention:

  • unwinding via early return
  • unwinding via panicking
  • infinite looping

It has already been shown that unwinding via early return doesn't work, but unwinding via panicking does work, while unwinding via early return does not work:

Here, .unwrap() clearly could panic.

How about infinite looping? That's possible too. Consider the following code that unwraps first_char_option without ever panicking or returning early:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let uses_callback = UsesCallback::new(|s| {
        let first_char_option: Option<char> = s.chars().next();

        // We can't `await` here.
        //tokio::task::yield_now().await;

        // We can't return an error here:
        //let first_char: char = first_char_option.ok_or("empty string")?;

        // But we can otherwise diverge:
        let first_char: char = loop {
            // this loop unwraps `first_char_option` but
            // neither returns early from the closure nor panics
            if let Some(c) = first_char_option {
                break c;
            }
        };
        
        format!("{s} begins with '{first_char}'")
    });
    assert_eq!(
        uses_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
    Ok(())
}

(Playground)

So getting back to the list:

  • unwinding via early return :x:
  • unwinding via panicking :white_check_mark:
  • infinite looping :white_check_mark:

The only thing that doesn't work is unwinding via early return.

And getting back to my question:

I (think I) have shown that it's not "side effects" in general that we are talking about. [1] It's also not "unwinding", as unwinding via panicking is possible. It's also not "control flow" in general, as infinite looping is possible.

So what is it that's not possible with (non-async, infallible) callbacks?

I suspect it is very closely related to continuations in general, and partly related to exceptions in particular.

Continuations aren't possible if a callback doesn't return a Future. [2] And if the callback doesn't return a Result, then we can't throw an exception by early returning, except if we use the panicking mechanism.

Let's see how the last thing could work:

use std::fmt;

const KEYS: [&str; 3] = ["one", "", "two"]; // middle item is the empty string

#[derive(Debug)]
struct MyError(pub String);

impl fmt::Display for MyError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl std::error::Error for MyError {}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = std::panic::catch_unwind(|| {
        UsesCallback::new(|s| {
            let first_char_option: Option<char> = s.chars().next();
    
            // We can't `await` here.
            //tokio::task::yield_now().await;
    
            // We can't directly do an early return:
            //let first_char: char = first_char_option.ok_or("empty string")?;
    
            // But we can work around that using the panicking system:
            let first_char: char = first_char_option.unwrap_or_else(|| {
                std::panic::panic_any(MyError("empty string".to_string()));
            });
            
            format!("{s} begins with '{first_char}'")
        })
    });
    let uses_callback = match result {
        Ok(value) => value,
        Err(err) => {
            let err: MyError = *err.downcast().unwrap();
            println!("Constructor reported an error: {err}");
            return Ok(());
        }
    };
    assert_eq!(
        uses_callback.into_inner(),
        vec!["one begins with 'o'", "two begins with 't'"]
    );
    Ok(())
}

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.97s
     Running `target/debug/playground`
thread 'main' panicked at 'Box<dyn Any>', src/main.rs:44:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But despite the noisy stderr output, we get the following output:

Constructor reported an error: empty string

(Playground)

So what does that mean? My hypothesis is that callbacks do not support continuations unless the API design arranges for that. They also do not support exceptions unless the API design arranges for that or(!) you're willing to use the panicking mechanism (which should generally be avoided for a couple of reasons).

What do you think?


While I think the observations above aren't specific to Rust, there are two properties of Rust that play an important role there:

  • Rust doesn't have a sophisticated exception handling mechanism (i.e. using the panicking mechanism comes with some friction as also visible in the last Playground).
  • Not all functions in Rust support yielding (compare e.g. Lua, where calling coroutine.yield is always possible [3], opposed to Rust where await is only allowed in async functions or blocks).

Of course, there are reasons for these properties / design decisions (not criticizing them here, just observing them and their impact on issues involving callbacks).


  1. That would be the case in a purely functional programing language. ↩︎

  2. With the exception of using threading and channels, but this also requires that these channels are part of the the callback API. ↩︎

  3. At least in pure Lua. If you have C frames on the stack, you can't yield unless the C functions handle that. ↩︎

Yes. I am saying: "Life is simpler if you avoid that design."

Infinite looping and panicking can be classified as edge cases because they are both blunt instruments — you usually don't want either one. If we exclude those, then the thing that is not possible is changing the control flow of B — asking it to stop early, to do something differently, or to async-yield — except as it explicitly permits.

Yes, this is related. In a language which supports arbitrary continuations, the distinction I made previously between callbacks and iterators no longer exists; any callback (part of A) can choose to suspend the caller (part of B) indefinitely, transforming it into a continuation value that might or might not be invoked later.

This is a very theoretically elegant system, but within it, it's hard to comprehend in general how control flow might be twisted up (unless you have no callbacks!), so I don't believe it's appropriate for writing programs that need to be carefully made correct in the kinds of ways e.g. unsafe Rust must be, and in any case, all examples I'm aware of require GC to manage the suspended stack frames and their outgoing references.

1 Like

Ah, I guess in terms of functional programming, one could say: infinite looping and panicking is like returning the bottom value, which is always possible due to the halting problem. With this in mind, I see how you could say that the limitations of callbacks are related to changing the control flow then. That makes more sense to me now!

Side note: I wonder if it's possible to do this in Rust when callbacks are async by using a custom executor and/or Context.

I feel like to a weaker extent, this is already sometimes a problem in today's Rust. Panicking and catch_unwind allows to "twist up" control flow in (sometimes) unforseen ways that leads to soundness bugs because structs are left in a "weird" state (UnwindSafe).

So maybe non-async, infallible closures are in fact a "feature" :laughing: (as in ensuring a certain guarantee regarding control flow). This reminds me of the replace_with crate, which takes advantage of that "feature", if I understand this right.

Unfortunately, this feature in restricting control flow is most often unwanted (I think?).

Thanks a lot for the theoretical discussion so far. I think it helped me to understand some things better (at least I hope I understand them correctly).

I wonder what's the take-away for practical, everyday life programming with Rust. I feel like I should be cautious with callbacks and closures and try to avoid offering them as a sole API for certain functionality. It might always depend on the particular use case though.

Edit: Which probably is, more or less, what you said here:

In async, with the call chain ABA' where A is polling B, B is an async fn, and A' is an async callback, then A' can collaborate with A to suspend B and resume it whenever it wants, or cancel (drop) it. In fact, this is exactly what happens any time B uses an async sleep or IO operation.

However, this is still not arbitrary continuations, because arbitrary continuations allow you to resume B twice. This is both why they are extremely powerful and why they make it hard to reason about code. This is not possible in Rust — unless B is a type that implements Future + Unpin + Clone, in which case you could clone it and resume both clones! There may or may not be some general continuation pattern that still can't be expressed; I'm not sure.

Yes. This is a general property of language design: if you add more fundamental operations or permitted states, you remove properties that other people writing code, or reading code, might want to rely on. Every language occupies a unique set of tradeoffs in this space, and this is part of why there are so many programming languages; there can never be one “best” language.

Also, an unsolved hard problem here is writing language specifications (or even documentation) that define a useful and yet not overly strict set of things that the language won't do in future versions. This is important whenever the language contains callbacks or generics, and seeks to make it easy or feasible to write code that is correct under all circumstances (which Rust does; this is among other things the concept of “is this unsafe code in a safe function sound?").

Historical example: Rust used to offer a guarantee that all values would be Dropped before they went out of scope; this would have allowed things like replace_with and scoped threads to be implemented purely in terms of droppable values rather than callbacks. However, it turned out to be infeasible to actually implement that guarantee (because of, among other cases, Rc/Arc cycles), and so it was removed (and std::mem::forget() added). If this inconsistency had been discovered after Rust 1.0 stability rather than before, it would have been a disaster for the correctness of code already written.

1 Like

Somewhat related: While I was reading TWIR 504, I found this post about async cancellation safety: Mutex without lock, Queue without push: cancel safety in lilos - Cliffle

This gives examples of several API design choices where async (and therefore cancellable) operations are broken up to have an async part and a sync (non-cancellable) part, specifically so that mistakes cannot occur because the sync part cannot be cancelled externally.

4 Likes

When using a functional language with immutable data structures, I rarely run into problems with closures. And when using Rust I find the same is true when no mutation is happening. So in Rust I only try to avoid closures when there is mutation -- the ownership and lifetime issues are so much worse in that case.

Even with mutation happening, I haven't seen problems with function callbacks as opposed to closures, since the function pointer doesn't own or borrow anything.

1 Like

If you want to be really sure of this, you also have to be careful about A dropping.

Reminder that the restrictions imposed by closures can be helpful too!

scope in std::thread - Rust relies on the boundary imposed by the closure for soundness, whereas the old API without the closure was unsound.

2 Likes

The PL theory term for this is effects, as in an effect system. Those are the keywords to search for. The less PL theory way it gets described as is the concept of "function coloring" (usually in a negative context). Withoutboats has a pretty good 2020 blog post on the problems of effects in Rust, and Koka is an interesting research language on what you can do with effect vocabulary.

The biggest shift in my understanding of effect systems is when I grokked the difference between composition and combination of effects — Rust does perfectly fine at expressing the former but doesn't capture the latter, e.g. the difference between Async<Iterator<T>> or Iterator<Async<T>> (composing) and AsyncIterator<T> (combining). Unfortunately this is quite similar to monads in the undesirable way, in that it's a high level of generalization difficult to explain — mainly because if effects are the generic "problem," monads are the generic "solution."

jbe, while you're looking for the language to generalize over and propagate effects, kbreid's main point is that a majority of the time, a preferable solution is to just write the code in a way in which you don't need to. Porting back to the language of monads, instead of writing code generic over monadic containers (effects), write the code in a pure functional way and leave any monadic composition to the caller.

It comes down to a familiar split between a "library" (your code calls into the library) and a "framework" (the framework calls into your code), as well as the related "inversion of control" concept.


It's worth noting that you absolutely can bridge between sync and async, or between any other two effects; it just might not all that pretty and involve a some not-insignificant overhead. Put the sync code into a fresh thread (spawn_blocking), then bridge the gap with some sort of channel. The async side can await on one end, and the sync side can block on the other end, and then it's "just" a matter of organizing the protocol to avoid deadlocking with both sides waiting on the other.

Rust isn't a research language anymore; it's first and foremost a practical language. I fairly firmly believe that getting into generalizing over effects quickly veers into being too abstract to be of practical use for most purposes. It's like ring algebra: you can find powerful insight by classifying things by a highly abstract interface based on shape rather than purpose, but the benefit of that insight is less actually doing manipulation at that level of abstraction, but in taking patterns from one instance and applying them to other instances, where what that actually accomplishes becomes more concretely visible again.

High level vision of direction should aim to keep these patterns in mind for the purpose of maintaining a consistent environment where abstract patterns do meaningfully port between applications, but having concrete applications of a pattern is just as (if not more) functional as directly providing the abstract pattern, and in fact produces a significantly more approachable API.

It's not just a matter of expressivity or such, it's also a matter teachability and comprehensibility. The ultimate in generality is dynamic duck typing — if a thing has a shape, the code will use it — but nominal typing does wonders for code understanding and reliability.

See also matklad's discussion of Concrete Abstraction — there's no benefit to exposing a generic interface if you're never generic over that interface. It's perfectly fine and even desirable to repeat a pattern between multiple types/functions/etc even without an abstraction for that pattern.

Just like you should avoid premature optimization, you should avoid premature abstraction/generalization for much the same reasons.


Fully tangential side note: if you're more interested in theory than practice for a given topic, it's probably a good idea to put that in the topic title. This forum has a deliberate bias towards the practical application of Rust and solving concrete issues rather than abstract theory.

11 Likes

But like I said:

For example, consider the very common Iterator::map method. It takes an FnMut closure, that can have any side effects (but not alter control flow of the map method). The aforementioned "feature" of disallowing an early return is not really something that's needed/wanted/nice here. Yet it's very very idiomatic to use the map method. A lot of people recommend functional-style code. But you run into problems:

fn functional_style() -> Option<Vec<u8>> {
    Some((20u8..=25)
        .map(|i| i.checked_mul(10).unwrap() /* can't use `?` here */ )
        .collect()
    )
}

(Playground with comparison to procedural style code)

edit: @H2CO3 shows that there is no problem in this particular case, see the response below.

In other cases, this limitation is desired, e.g. with the replace_with crate or when we have scoped threads. But I feel like in the vast majority of cases (99% ?) where closures/callbacks are used in an API, we do not want these limitations (such as shown in the case of map).

That said, I don't want to say map should generalize over any sort of "effect" or be async, etc. But it makes me wonder if recommendations I stumble upon here in this forum to write some things functional style are really a good idea. What if using map works fine, and then later I need to refactor my code to do an early return in some cases? The same holds when I offer an API that expects a callback (which caused me asking the question in the other thread about regex not supporting fallible replacements, which is a practical issue [1] and not a theoretical one).

I thought about the term "effects" too, but I thought that this also covers mutation? E.g. the Monad in Haskell is often used to describe I/O operation (or other, possibly more limited side-effect). That's something any FnMut closure can do in Rust. We don't need Monads for it.

So maybe "effects" is a too broad term here (unless we are in a functional programming language, which Rust is not)?

I wonder if I ran into exactly this when I attempted to get around the latter by deciding to simply return a Vec of Futures here (instead of having to use some sort of asynchronous iterator, which I didn't even know about… oh wait, that's what's previously was called Stream in the futures crate, right?).

Well, I don't know if I want the language to generalize over and propagate effects. I would like to avoid having to duplicate code :sweat_smile:. Or having to use ugly solutions like that one. Especially when dealing with async (e.g. in the multivariate_optimization crate, which is mostly an experiment for now), I feel like I end up with a lot of overhead/duplication because of async. I would like to gain knowledge to not run into these issues all the time.

I don't understand this. Could you rephrase it? Note that while I am fascinated by functional programming, I don't have that much knowledge in or experience with it.

Yeah, I have been doing that in past. You're saying: Just ignore being async time to time, and let the caller deal with it using threading? Maybe it could help me get around a lot of headache involving async Rust. But… hmmmm… not sure if it's that nice. But I'll definitely keep that in mind as an escape hatch when things get too complex.

I don't think I'm "more" interested in the theory than in practice. [2] I am interested in both. The reason for writing this post are practical issues I run into (repeatedly). These can be as simple as being unable to use the ? operator in a closure passed to .map, or end up being very complex, such as refactoring hundreds of lines of code because I figured that something I wanted to do won't work well with async. Nonetheless, I am also interested in the theory behind this problems because I like to extend my knowledge in that matter.

Why can't I be interested in both? I tried to express that by writing "theoretical limitations and best practices" in the title.

I know I rather tend to use generics or abstract data structures too often, and part of my own learning process is to abstain from trying to be too generic. It's good to be reminded time to time that sometimes solutions like

  • spawn_blocking instead of pure async,
  • Vec<T> instead of Box<[T]> instead of impl IntoIterator<Item = T>,
  • String instead of Cow<str> instead of Borrow<str>,
  • [T] instead of Index<T>,

cause a lot of less problems, especially as sometimes traits in the std library seem to be implemented in a way that doesn't always work well, e.g. in regard to AsRef or in regard to Index.

I still enjoy exploring generics, mostly to learn more about Rust and its limitations (and possibly also about some of its design flaws, and how to work around them and/or to avoid them!).

Maybe bringing up generics on this forum sometimes has the same effect as including it in your source code. :see_no_evil:

I mean seriously: I hope it's okay I ask these questions. I don't think it should be responded to in such a negative way as it sometimes happens here. Isn't this forum big enough for both practical issues and theory? Or should I really stop asking these questions here? Where should I go then?


  1. Sure, the API can offer a "method" that consists of copy & pasting code from the docs to solve this. This might be okay in case of regex::Regex::replace_all as it's "just" 16 lines of redundant code to be copy & pasted assuming you don't need any optimizations). But regex is just an example here. ↩︎

  2. Maybe I'm interested "more" in the theory than many other people. ↩︎

1 Like

You are holding it totally wrong.

It's not that there's an "undesirable restriction of control flow". The right question to ask is exactly the opposite: why should there be an additional feature to regulate control flow here? Iterators are already lazy, and the fallibility handling is built into the container. Your "ugly" functional example is perfectly expressible as

fn functional_style() -> Option<Vec<u8>> {
    (20u8..=25)
        .map(|i| i.checked_mul(10))
        .collect()
}

And this is the correct way to do it: whoever collects the iterator has the opportunity to simply stop calling next() if that's what s/he pleases, and this is exactly what Option and Result's FromIterator impls do. There's absolutely zero need to run additional circles in order to do exactly the same thing with different syntax.

TL;DR: learn the standard library, don't blame the language.

4 Likes

Interesting, I didn't know that. (Playground)

I suspect it's this impl in std:

impl<A, V> FromIterator<Option<A>> for Option<V>where
    V: FromIterator<A>,

This might come in handy in a couple of cases. Thanks for that hint.

I guess it only works though, because collect is very generic. So here, generics (in std) help to avoid friction.


Maybe the example was a bad one then. But I would still run into problems if I attempted to do anything async, right? (edit: see update below) And in case of Regex::replace_all, the problem of lack of fallibility does exist.

Perhaps I still have no good "feeling" in regard to when a closure/callback does cause problems, and when it doesn't.


Update:

Regarding .map and async, there remains the question of the ordering of realizing the futures. I guess that's what FuturesOrdered and FuturesUnordered are for. So it is possible to write the example functional style if you use these. It requires a bit of Option/Result gymnastics though, it seems:

use futures::stream::{FuturesOrdered, TryStreamExt};
use tokio::task::yield_now;

async fn functional_style() -> Option<Vec<u8>> {
    FuturesOrdered::from_iter((20u8..=25).map(|i| async move {
        yield_now().await;
        i.checked_mul(10).ok_or(())
    }))
    .try_collect()
    .await
    .ok()
}

(Playground)

I think you in particular are experiencing some push-back because (insofar as my memory recalls accurately) many of your threads start out sounding like an ordinary “practical issue”, and then as the question gets refined, turn into something more like “I want to make the code I am writing as uncompromisingly generic as possible”. Many people do not consider that really a “practical” matter; such genericism often trades off with usability, comprehensibility of code and of errors, compilation time, etc.

It's also not taken well when a “practical” discussion turns into “oh well, looks like Rust's design is broken” — while of course there are in fact flaws, there are a lot of threads where once such an observation has been made, the person making it doesn't even want to talk about alternatives, and it gets tiresome, especially when the design in question is a tradeoff which provides some other benefit and not simply a mistake in hindsight.

I don't mean to say that you have done anything wrong or that there should not be a place for the discussions you want to have; rather, I want to highlight a possible source of tension and how the topics you are interested in can be perceived from other perspectives. Thoughtfully framing your posts (and separating the “theory” thread from the “practical” thread, as you did in this case) can help.

9 Likes

I often like to make my code generic where possible. I don't think that's bad per-se. (Or at least it shouldn't be a bad thing to do.) And when I experience friction, I like to understand why there is friction or what's the underlying cause (and when and why to refrain from being generic).

During that process, I sometimes discover that there are some flaws in the language or in std. When I find them, I try to submit bug reports. I consider that to be pretty constructive. Though, of course, I may also make mistakes in that process (like anyone can do), both in regard to technical matters as well as organizational or communication errors.

But thank you for sharing the perception.

Well, I would appreciate if frustrations with other people aren't projected onto me. I like to discuss alternatives and learn.

I thought that having "theoretical limitations" in the subject was pretty clear. But maybe not clear enough.

Anyway, thanks for your response. I honestly appreciate your time and effort both in regard to the technical subject as well as the meta questions regarding the forum. :pray:

1 Like

I finally looked into that. Looks interesting, even if not nearly as production ready as Rust, it seems. But it might help me to better understand certain things by exploring that language. Aside from looking like fun. Thank you.

ouroboros goes a steps further and also has versions for send vs. not send for the async cases. If you use if for your self-referencing structs, you’ll earn a total of 9 constructors and 6 builder structs :exploding_head:

I don’t mean to imply this observation has anything to do with being truly idiomatic; maybe it’s more about being truly general.