Explanation for difference between `.filter(func)` and `.filter(|x| func(x))`?

In my mental model of Rust, .filter(some_func) and .filter(|x| some_func(x)) were identical, but I've now seen that this isn't always the case.

Could someone explain what's happening in the following example?

fn main() {
    let v: Vec<String> = vec!["foo".to_owned()];
    
    fn is_foo(s: &str) -> bool {
        s == "foo"
    }
    
    // compiles and runs fine
    v.iter().filter(|s| is_foo(s));
    
    // compiler error
    v.iter().filter(is_foo);
}

This fails to compile with the error:

10 |     v.iter().filter(is_foo);
   |              ^^^^^^
   |              |
   |              expected signature of `for<'r> fn(&'r &std::string::String) -> _`
   |              found signature of `for<'r> fn(&'r str) -> _`
2 Likes

v.iter() enumerates &String. So the passed in function must take values of that type. Thus the inferred type and structure of the lambda Is effectively

v.iter().filter(|s: &String| is_foo(s.deref() /* &String -> &str */));
4 Likes

In general, the difference is in coercions: Coercions - The Rustonomicon

Any time |x| foo(x) triggers a coercion somewhere it won't be equivalent.

This is certainly confusing, though, so rust may, in the future, gain new coercions such that one could coerce a fn(&str) into a fn(&String) or similar.

6 Likes

I feel like deja vu, I had literally just gone on a bit of a rant about issues just like this yesterday: Rust beginner notes & questions The issue is more pervasive than just the closure-to-function refactoring! A wide range of trivial code changes make the compiler throw its hands up in the air unpredictably:

fn foo(x: u8) {
    println!("{}", x);
}

fn main() {
    let buf: &[u8] = b"foobar";

    // This is a pattern I often follow, I start with  iterator-based code:
    buf.iter().for_each(|x| println!("{}", x));

    // But when the closures get a bit too complex I like to extract them into dedicated functions. 
    // However, this doesn't compile!
    buf.iter().for_each(foo);
    
    // No problems here though, despite no visible use of the deref operator. In a complex
    // expression this is a nightmare to trace, even with  the help of IntelliJ's Rust plugin!
    buf.iter().map( |x| x + 1 ).for_each( foo );
    
    // This compiles.
    let _x: u8;
    let _x = buf.iter().next().unwrap();
    
    // But a *trivial* simplification I expect to work doesn't compile! Wat.
    let _y: u8 = buf.iter().next().unwrap();
}

I very strongly feel that this is just one example of a very serious issue with Rust's library design and everyone needs to stop and think carefully about the direction it's taking.

Layer upon layer of ad-hoc automagic leads to unpredictable, anti-human results that make things not just worse in the long run, but can absolutely destroy languages. JavaScript is basically unsalvageable now, and its usage persists only because of sheer momentum. It is essentially mathematically impossible to make a refactoring IDE for C++ with anywhere near the same capabilities as IntelliJ or Visual Studio because of design decisions like the ones I posted. Rust is already dangerously close to becoming forever impossible to process with tooling of this type, or any other type.

It's now getting difficult for humans to grok too, and for no good reason. I'm not talking about the borrow checker, or the necessary difficulties inherent to system programming, but "surprising" foot-guns like this that just make no sense to anyone used to either mathematics or any other sane programming language:

let a: OsString = OsString::from( "foo" );
let b: &str = "bar";
if let Some(_) = a.partial_cmp( b ) { /* Compiles fine! */ };
if let Some(_) = b.partial_cmp( a ) { /* LOL, no. */ };
9 Likes

While I generally agree with your point here that subtle distinctions such as owned values vs references can be very confusing to beginners, especially so as we introduce more and more sugar to make them less annoying to experienced users, note that this specific example probably does not do what you think it does:

fn main() {
    let buf: &[u8] = b"foobar";

    /* ... */
    
    // This compiles.
    let _x: u8;
    let _x = buf.iter().next().unwrap();
    
    // But a *trivial* simplification I expect to work doesn't compile! Wat.
    let _y: u8 = buf.iter().next().unwrap();
}

Anytime you type "let", you are creating a completely new object, which is completely independent from previous ones and can in particular have a different type. For example, this is correct Rust code:

let x = 42;
let x = "Yeah";

At the point where the second "let x = ..." statement finishes execution, the original value that was named x will be destroyed and never seen again.

The fact that such "shadowing" can occur without being linted as problematic by a compiler warning is one particularly polemical aspect of Rust's syntax. Its proponents argue that it allows code to have a more dynamically typed feel, avoiding the need to declare multiple variables with different names for what is essentially the same information in different forms:

let number = "42";
let number = number.parse::<i32>().unwrap();

Speaking personally, I think it introduces more confusion than it's worth and prefer to avoid such constructions in my code. In the above, I would probably have called the first variable "number_str" or similar to clarify the distinction.

The code which you had in mind is probably this:

let _x: u8;
_x = buf.iter().next().unwrap();  // Notice that the "let" is gone

...which won't compile for the same reason that the "let _y: u8 = ..." version doesn't: you are trying to shove a reference into a place which is meant for a value, and it does not fit. It makes sense when you think about what happens at a lower level: in an unoptimized translation of the original Rust code, an u8 is a 8-bit integer, an &u8 is a pointer (32-bit or 64-bit depending on your computer), and you cannot store 32 bits of information in a memory location which only provides room for 8 bits!

Now, there is a delicate balance to be reached between keeping the distinction between a value and a reference clear, and making references feel like owned values in normal use. C++ and Java are two languages which went as far as they could on the "mimick values" side of this trade-off, whereas Rust started more on the "clarify the difference" side and is gradually shifting more to the "mimick values" side as it gets more widely used and people get tired of typing stars everywhere. It probably will never go as far there as Java did because the distinction between values and references is more important in a non-GC language, but there is definitely room for more language fine-tuning in this area.

5 Likes

Shitting on PHP is a cheap shot. And you probably don't have to worry about rustc.

This behavior of temporary expressions is deliberate to limit scope of temporary borrows to prevent them from needlessly conflicting with later code. In most cases this is actually what you want, and you don't even notice it.

Non-lexical-lifetimes will be landing soon and relax lifetimes in several related cases, so perhaps lifetime of temporary expressions will be more flexible in the future, too.

1 Like

Nobody is talking about lifetimes here. Check the examples from peter post.
For example, if .filter(|x| func(x)) compiles, than .filer(func) should compile also.
Same holds for comparison of OsString and normal strings. It should work in both directions.

These things makes me worried, whether there is some fundamental flaw in Rust design.
Maybe yes, maybe not, maybe it does not matter much, I can live with occasional compiler errors, if the result runs as it should.

2 Likes

I’d say Rust, right now, is a very nuanced language. There are lots of little things that change behavior (eg coercion sites, type inference walking “forwards” in code), some features are incomplete and imperfect (eg lack of NLL on stable but even the nightly NLL impl is still conservative and will not make some safe code compile due other things missing like a DerefStable), mutable borrows moving vs reborrowing, associated consts work (on nightly), sort of - not in all cases where they should (and probably will at some future point in time), and others. Right now it’s a bit of a hodgepodge of stuff basically and an IDE today has very little chance of offering more than rudimentary autocomplete (and not in all cases) and navigation (again, not in all cases).

Let me also mention that writing combinator heavy code, such as tokio + futures, leads to virtually incomprehensible and occasionally useless error messages. All you might be missing is a map_err(|_| ()) somewhere in the long chain or not returning an IntoFuture type where one is needed, and the compiler will happily tell you that your gooble-de-gook doesn’t have some method because it doesn’t satisfy a trait that some piece of your type pie needed. I personally just skim these errors nowadays and then try to deduce what the issue might be from experience rather than trying to parse the error bit by bit.

The hope is that something like RLS or a rich compiler AST lib can enable IDEs to do a better job without reimplementing rustc themselves. I suspect a great IDE experience is a fair ways off.

The “code via compiler errors” is a bit offputting - I’ve gotten over that some but it’s still an uncomfortable feeling (and I feel that I have a decent handle on the language). That said, no language is perfect and some comments here and in the other thread are more about stdlib than the language itself. Personally I’d be more concerned about the language as that’s the bedrock - libs can be changed and replaced more easily.

6 Likes

To play devil's advocate in response to my own criticisms, I don't belive the Rust language is all that flawed. The let shadowing explained by @HadrienG is definitely a language foot-gun however: I already knew about this, having been bitten multiple times, yet I tripped over it again.

However, about 90% of the issues I've had during my initial learning curve were to do with high-level design choices in the std library, such as missing pieces, discoverability problems, overuse of abbreviations, lack of symmetry, underutilisation of traits, and a lack of "completeness" for a want of a better word.

Some of this is constrained by language decisions, but a lot of it feels like a "lack of a coherent design". For an example of this ad-hoc design process, take a look at the design for generators:

"At this time the main intended use case of generators is an implementation primitive for async/await syntax, but generators will likely be extended to ergonomic implementations of iterators and other primitives in the future."

Someone has an itch: they want compiler-magic to implement C#-style async for Futures. Solution: generators. As soon as I saw that, I screamed a silent "Noooo! STOP!" in my head. There are so many things wrong with this I barely know where to start:

  • Iterators are mentioned as a goal, yet instead of fn next() -> Option<T>, the signature is incompatible right out of the gate.
  • Two different return types from one closure? Err... what? Compare with C#'s yield or async, both of which just wrap a single type, instead of switching types in a temporal sense.
  • Doesn't appear to be extensible to other more complex control flows.

Most importantly, I see no mention of the cutting-edge research, such as: Scoped Continuations. Note that in that article the samples were "hacked into" Java, a language that is just too weak to properly implement such an advanced concept.

Rust could be that strong-enough language!!!

Something like strongly typed compiler-generated state machines for flow control would absolutely shit all over any other language, because it is:

  • A superset of Monads. All the stuff the F# and Haskell guys rave about would be a pale imitation of Rust.
  • A better fit for procedural languages than Monads.
  • Extensible to "not just that one itch", but a huge range of abstractions, including warnings, "prompt to continue", ambiguity, progress notification, cancellation, cooperative threading, etc...
  • An excellent fit to the efficient state-machine Rust philosophy.

The opportunity is just too great to ignore.

Here's an acid test for the proposed generators:

  • Is there a proposal for a rich library of implementations that compose elegantly, like in the linked scoped continuations article? Can I compose the concept of cancellation and async? Can I compose ambiguity and cancellation?
  • Sure, it may be extensible enough to support forward-only iterators. Is it powerful enough a concept to allow an iterator to be bidirectional? E.g.: support not just next() but also next_back()?
  • The Iterator trait has dozens of default implementations, providing a rich zoo of functions "for free" on top of just next(). What are the equivalents for generators? Is it going to be as rich? Can it be as rich as Iterators?
6 Likes

You're only linking to the final outcome of the discussion about generators; there was a LOT of it, and all sorts of things were discussed. That is, you can't necessarily judge what all was discussed by the outcome of the discussion, if that makes any sense.

2 Likes

I think it is important to point out that Ron Pressler's scoped continuations are actually not the cutting-edge research that he claims there are. They are just a new variation of the old concept of user-mode threads aka "green threading" aka "stackful coroutines", which has been with us for a long while. Go's goroutines are one modern and popular implementation.

That Pressler is not keen on pointing out this lineage is to be expected, as he is very eager to point out that he is critical of programming language innovation, a psychological bias which will naturally lead him to ignore said innovation altogether. Here is just one example of him ranting about it among many others:

My academic interests do not include programming languages, which I view as the sub-discipline of computer science that has consistently over-promised and under-delivered more than any other (with the possible exception of AI). I am more interested in algorithms than abstractions, and programming language research is mostly concerned with the latter.


Now that I got this out of the way, let's phrase the problem in a fashion which I think relates better to your underlying point: how do asynchronous programming models compare to one another? First of all, we need a reasonable list of popular asynchronous programming models. I think this blog post from an author of Twisted Python, which is in general a good read, does a very good job at enumerating them:

  1. Callbacks
  2. Futures + Callbacks
  3. Explicit coroutines (where suspension points stand out)
  4. Implicit coroutines (aka "green threads").

Rust currently has 2, and wants to move towards 3 for the benefit of a more natural syntax. Ron Pressler is essentially advocating for a more user-visible variant of 4 in the context of Java.

Now, how do these explicit and implicit coroutines differ? In my view, there are two key differences.

The first one is that explicit coroutines make it easier to reason about asynchronous codes, because suspension points are made clearly visible via "yield" or "await". On their side, implicit coroutines make it easier to integrate asynchronism into existing codebases, as suspension points do not need to be explicitly annotated, but the price to pay is that the resulting code is less comprehensible. There is a strong analogy to be made with the difference between implicit exception propagation and explicit error propagation via checked exceptions or monadic errors, I will expand upon it a bit later on.

The second difference is that from an implementation point of view, explicit coroutines easily translate into state machines (e.g. futures + combinators), whereas implicit coroutines essentially force programming language runtimes to replicate the job of an operating system's thread management facilities in user mode.

It is easy to brush off the second point as something for compiler authors to worry about and users to ignore, but it actually has far-reaching implications. First of all, as implicit coroutines are much closer to OS threads, the performance benefit of using them is much less clear-cut. You still need to reserve sizeable chunks of RAM for stacks for example. Second, since a programming language runtime cannot do all the things which an OS task manager can (e.g. grow a stack efficiently without invalidating pointer adresses, and catch all blocking syscalls), implicit coroutines are extremely difficult to implement correctly and efficiently in a programming language like Rust which exposes low-level system access. They are a much better fit to "managed" languages where the programming language runtime removes a lot of low-level system control from you in order to have more implementation tuning headroom.

This is why Rust, which used to have implicit coroutines a long time ago, eventually dropped them. There are just not a very good fit for a low-level systems programming language.

But the first point of ergonomic differences is interesting too. It is essentially the eternal implicit vs explicit debate in action. Compared to implicit exception propagation as implemented in C++, Rust's monadic errors are widely lauded for how easy they make it to reason about error handling and understand foreign APIs, and reviled for how much more complex they make error handling integration, propagation and composition. It is one of these cases where you cannot have it both ways and must choose sides.

Hopefully that will have given you a bit more food for thought on the complex implications of this design choice, and why scoped continuations may not be as good of a fit for Rust as you think. In case you are interested in more discussion on this topic, I strongly suggest that you follow @steveklabnik's advice and go have a look at the long design discussions that already went on concerning generators and async/await. One good start would be to have a look at the comments on the Generator RFC PR, for example.


Now, onto some of your specific points:

Iterators are mentioned as a goal, yet instead of fn next() -> Option, the signature is incompatible right out of the gate.

There is a reason why generators are an experimental feature which will require a new RFC for stabilization. They are known to be imperfect and have not yet reached their final design, the point was to get a prototype rolling in unstable form so that the Rust community can experiment further with them and form more precise opinions about what the final feature should look like.

Two different return types from one closure? Err… what? Compare with C#'s yield or async, both of which just wrap a single type, instead of switching types in a temporal sense.

I certainly agree that this specific part of the generator design makes little sense to me, and IIRC that has been pointed out before during the generators discussion as something we may want to remove from the final design.

Doesn’t appear to be extensible to other more complex control flows.

This is where compiler integration will play a role. For example, the experimental async-await macros expose how "natural" control flow could be transparently translated to future combinators in order to produce natural-looking asynchronous code that compiles down to highly efficient state machines.

Is there a proposal for a rich library of implementations that compose elegantly, like in the linked scoped continuations article? Can I compose the concept of cancellation and async? Can I compose ambiguity and cancellation?

I think I will need some examples of specific problems that you want to solve in order to understand your question better.

Sure, it may be extensible enough to support forward-only iterators. Is it powerful enough a concept to allow an iterator to be bidirectional? E.g.: support not just next() but also next_back()?

AFAIK, "full" bidirectional iterators are generally problematic in Rust's ownership and borrowing model because they allow an iterator to produce two &mut to a single value in a row (via a next_back/next cycle), which violates the language's "no mutable aliasing" guarantee. Such iterators can thus only be of the streaming kind, where a client is not allowed to hold two values from an iterator at once. And this in turn hampers useful functionality like collect().

The Iterator trait has dozens of default implementations, providing a rich zoo of functions “for free” on top of just next(). What are the equivalents for generators? Is it going to be as rich? Can it be as rich as Iterators?

If generators are, as currently planned, extended to be allowed to produce something which implements the Iterator trait, you will get all associated iterator functionality for free.

10 Likes

Is this true? After looking into it myself, it seems that a critical part of his proposal is the ability to name the
suspension points, so that a single function may yield to one of many different suspension points. This is what makes it capable of being used to compose abstractions together in arbitrary ways. It seems that this is quite different from green threads, unless I am underestimating their power?

When I was reading about this, I saw the link to Oleg Kiselyov's paper on extensible effects. In that paper, I was blown away by one of the examples shown, which I'll try to "rustify" here. (Later, watching through Pressler's talk, it seemed clear to me at least intuitively that scoped continuations and extensible effects are very closely related, and that scoped continuations should be capable of doing this as well)

It shows a single effectful function with multiple effects being called in two different ways, producing two different types. tl;dr: Effects let the caller choose how abstractions are composed.

Notes on made up syntax:
  • A function may have one or more explicitly named effects.
  • Effects are like Java's checked exceptions. You cannot call an effectful function without it being reflected in your signature...
  • ...unless you use an "effect handler", which encapsulates the effect. (here we will tend to call these run, which is merely Haskell's convention)
  • To indicate effects, function signatures may end in needs E, needs (E1, E2), or needs (E, ..Others).
  • I'm afraid I'll have to handwave on the meaning of ..Others.
    (the paper relies on using an unordered heterogenous list of types)
  • effect Name<Generics> { ... } declares an effect.
    For the purposes of the example, you may assume that this has merely does the following:
    • Declares Name<Generics> to live in the type namespace.
    • Defines associated functions (which must be called using UFCS)
  • The bodies of the run function, which I omitted, are where elves are hiding.
#[derive(Debug, PartialEq, Eq)]
struct Error;

// a function that uses two effects
fn inc_then_throw() needs (Counter, Throw<Error>) -> &'static str {
    Counter::increment();
    Throw::throw(Error)  // note: unconditional throw so there is no happy "&str route"
}

fn main() {
    // Call inc_then_throw with effect handlers in two different ways.
    //
    // Observe that the output type is different based on the order
    // of the handlers!  That is to say, the CALLER gets to choose how
    // these two abstractions are composed.

    // (I'm writing separate declarations only to make side-by-side comparison easier)
    let out1: Result<(&'static str, u32), Error>;
    let out2: (Result<&'static str, Error>, u32);

    out1 = Throw::run(|| Counter::run(inc_then_throw));
    out2 = Counter::run(|| Throw::run(inc_then_throw));

    assert_eq!(out1, Err(Error));
    assert_eq!(out2, (Err(Error), 1));
}

// ----------------------------

effect Counter {
    fn increment() needs Counter;

    /// Default effect handler for Counter.
    ///
    /// Encapulates a call to a function that `needs Counter`,
    /// returning the function's return value along with how
    /// many times it called `increment`.
    fn run<F, R, ..Others>(func: F) -> (R, u32) needs ..Others
    where F: Fn() -> R needs (Counter, ..Others)
    { /* not shown */ }
}

// Effect for a Java-style checked exception.
effect Throw<E> {
    fn throw<T>(error: E) -> T needs Throw<E>;

    /// Default effect handler for Throw.
    ///
    /// Encapsulates a call to a function that `needs Throw<E>` in a way
    /// that no longer throws type E, by returning a `Result<_, E>`.
    ///
    /// More precisely:
    /// * If the function returns a value, it is wrapped in `Ok`.
    /// * If the function calls `throw` on a value of type `E`, it is wrapped in `Err`.
    /// * If the function calls `throw` on a different type, it continues to propagate up.
    ///    (this is reflected in `needs ..Others`)
    fn run<F, R, ..Others>(func: F) -> Result<R, E> needs ..Others
    where F: Fn() -> R needs (Throw<E>, ..Others)
    { /* not shown */ }
}
1 Like

I think this is something which could be easily introduced in an existing green thread implementation, if there were a willingness to do so. Consider: in every green thread implementation that supports nesting (that is, pretty much all modern ones), every nested green thread introduces a coroutine scope, there is a hidden global variable which keeps track of the stack of active coroutine scopes, and by running "pause()" or equivalent, one can save the state of the innermost scope and jump towards the language runtime's scheduler or a user-defined hook.

Now, to make this work like Pressler's scoped continuations, all that would be needed as far as I can see is to make this coroutine scope state more explicit, by allowing the programmer to give each scope a name (e.g. by passing scope IDs as function parameter or via a global hashmap) and to interact with every named scope (not just the innermost one). With this added capability and some minor implementation tweaks, the programmer would then be able to choose which scope the pause() function should be invoked on.

There is an analogy to be made with how most programming languages allow you to "break" out of the innermost for/while loop, but some of them also allow you to introduce loop labels which allow you to easily break out of nested loops. No fundamentally new capability is actually introduced in the implementation, it is just new syntax which allows the programmer to express concepts which were un-expressible before.

2 Likes

I did some digging, and found the change: https://github.com/rust-lang/rfcs/pull/230, with detailed motivation, shipped in 1.0-alpha1

This is one of the things I find so fascinating about Rust, a LOT of these "big" discussions have not just been discussions, but have been actually tried in pre-1.0 Rust. That experience is so valuable! And it is just out there in the repo, available for other language designers to learn from!

3 Likes

Could you tone down the language you're using, please? This is a technical discussion, and as such, there's no place for hyperbole in it. To give some concrete examples:

Please don't tell us what we need to do. This is incredibly rude.

Whether rooted in fact or not, this is hyperbolic scare-mongering.

You could start by not letting us know what you screamed, and enumerating your concerns in a manner that invites discussion, as opposed to raising everyone's stress level.

Please don't say things like this. I don't claim to speak for any Rust communities, but I'm pretty happy to state that nobody here has any interest in making anything that shits on other languages.

To reiterate what I said at the beginning: this is a technical discussion. Far more will be achieved by calm, reasoned points than repeated exclamation marks, screaming (silent or otherwise), and swearing.

16 Likes

This conflates interface and implementation, when they are actually orthogonal. Explicit suspension points are not what enable the state machine transformation. All that is required is that suspension is treated like an effect, so that suspension never crosses a non-coroutine function in the call stack.

For example, Kotlin works this way. Coroutines are marked as suspend funs and are always transformed to state machines. The language has no await syntax, and instead calls to coroutines automatically become suspension points. Because suspend is an effect, you cannot call a coroutine from a non-suspend context- to obtain a Deferred<T> (either from outside or inside a coroutine), you pass a coroutine to a library function instead of calling it directly.

This arguably has benefits to comprehensibility that the explicit approach does not. For example, in C# you can call an async function from a non-async context, and regardless of the context this returns a Task<T>. If you don't need the result, it is quite easy to forget the Task.Wait or await, leading to surprising behavior.

It also offers the possibility of effect polymorphism- in this case, functions that can behave either as synchronous or asynchronous. For example, you might want to write this:

cache.entry("key").or_insert_with(async || {
    let response = await_an_http_request();
    process(response)
})

If the function or_insert_with were polymorphic over the async effect, it would continue to function as today unless passed an async closure. In that case, it would itself become a state machine, with a suspension point at the closure call site. With implicit suspension points, this would require changes to the closure parameter's type but none to the function body; with explicit suspension points this would require the function body to use some syntax like await_if_async f().

Further, this is also a perfect example for composing scoped continuations! Potential failure via Result can also be treated as an effect. Functions that can fail, rather than becoming state machines, gain early returns for each call to another fallible function. Imagine if or_insert_with were made polymorphic over this effect, which I'll call try:

cache.entry("key").or_insert_with(try async || {
    let response = try_await_an_http_request();
    try_process(response)
})

Now, because or_insert_with is polymorphic over both the async and try effects, it not only becomes a state machine- it also early-returns if the closure returns a Result::Err. Rust currently uses explicit ? much like an explicit await, so this would ordinarily require a change to the function body of or_insert_with. With implicit failure alongside implicit coroutines, it could accept both infallible and fallible closures with no changes to its body.

For completeness: this is all assuming effect polymorphism would be done at compile time, like generics, and would thus allow for the state machine transformation as part of monomorphization. You could also imagine a trait-object-like approach that uses a green-threads-like implementation to handle it dynamically. I don't know that this would be anywhere near as practical, but it's nice to fit into your conceptual framework.

So while I don't know that we want or need to expose general scoped continuations or extensible effects in the language, the concepts work fine with efficient implementation strategies and also enable some useful functionality over "monadic" approaches.

6 Likes