Why can't we use FnOnce::Output?

I use a lot of iterators in my code. It would make my code an awful lot cleaner if functions returned impl Iterator<...> instead of a concrete type, because then I could use combinators like map.

However, I often construct iterators then consume them elsewhere - which means packing my iterator into a struct. And you can't put anonymous types into a struct. So I've ended up wasting a lot of lines of code manually implementing iterators. (To get a sense of scale, my current project have 54 iterator implementations!)

I'm excited for type alias impl trait. But I also notice that FnOnce also has an Output parameter:

pub trait FnOnce<Args> where Args: Tuple {
    type Output;
    // ...

But I can't use it. This doesn't work:

fn make_iter() -> impl Iterator<Item=()> { todo!() }

struct Foo {
    iter: make_iter::Output,
}

My question is this: Why not? Why was the compiler designed to reject this code? Obviously supporting this would make type inference more complex in some scenarios (since the compiler would need to detect & reject type inference loops). But TAIT also has this problem, and tait seems far more complicated.

Is this a good idea, or am I missing something obvious?

1 Like

Does something like this work for you? You can avoid naming a concrete type but still use the struct for all intents and purposes.

struct Foo<I> {
    iter: I,
}

fn make_iter() -> Foo<impl Iterator<Item=()>> {
    let iter = std::iter::once(());
    Foo { iter }
}

fn foo_next(foo: &mut Foo<impl Iterator<Item=()>>) -> Option<()> {
    foo.iter.next()
}

The direct trait is unstable, due to the lack of variadic generics (unideal argument arity handling) at least. And maybe some vague hopes of GATifying them or something.

So you have to use the special "sugar" on stable, which is indeed salt when you wish you didn't have to name the return type.

You can create a subtrait,[1] or we'll probably get return type notation (RTN) at some point.

(I haven't looked at the current state of RTN, but last I did thought a way to name functions etc in combination with the ability to use Output would be cleaner.)


  1. on mobile so no example for now; not sure how inference wrecking it would be for your use case ↩ī¸Ž

2 Likes

Unfortunately, RTN (as implemented and as defined in the RFC) doesn't work on function traits, only on associated functions. That's merely a future possibility. (I found this out after the call for testing, because every single place where I needed a Send bound on a future was on an async callback function, so I couldn't use it at all.)

4 Likes

You can use combinators on concrete types.

This is also not true. You just have to make them generic.

(& @drewtato)

Thanks, but that's far from ideal as it makes my new struct Foo<T: Iterator<...>> also impossible to name concretely. This makes the "unnameable type" problem bleed upwards and outwards and into more and more of my code.

Imagine a struct A, containing a struct B, containing a Struct C, which itself contains my impl Iterator. You end up needing to add generic parameters all over the place, in entirely unrelated parts of the program. (And I have 40k lines of rust code, so no thanks.). And then you get problems with type inference - since the larger object can only be constructed from a concrete instance of that impl Iterator. But the iterator is often (for example), constructed on the fly & cached. And things like that.

It can be made to work In trivial programs. But every time I've tried this, I've inevitably been burned by it. I always seem to find some circumstance somewhere in my code where I need a concrete name for one of my types and I can't provide one. When more of the types in my program are tainted like this, it becomes much more likely to cause problems. Even when it works today, it has caused innumerable problems when I try to refactor my code - either again, because I need a concrete name for my type somewhere, or because adding or removing a single field from a struct may now require hundreds of lines of code changes to add / remove all the generic parameters everywhere.

1 Like

Okay, I see your issue is organizational. The generics approach is definitely something you'd want to do when starting from scratch. It requires writing the majority of code against traits instead of structs. And you can erase the generics at key points with trait objects, but it sounds like you aren't willing to sacrifice performance.

Anyway, the original question was "Why was the compiler designed to reject this code?". The answer is that it is designed to reject anything that hasn't been implemented. So hopefully in the future, RTN or TAIT will be fully implemented, and one of these will compile:

struct Foo {
    iter: make_iter(..)
}

type MakeIter = impl Iterator<Item = ()>;
struct Foo {
    iter: MakeIter
}
1 Like

Okay, I see your issue is organizational. The generics approach is definitely something you'd want to do when starting from scratch. It requires writing the majority of code against traits instead of structs.

This is a total tangent, but I take some issue with this characterization, as it implies I'm at fault for how I've organized my code. I simply don't think "writing the majority of your code against traits" is ever a good idea. Not in rust, and not in any other language where interfaces / abstract superclasses are a thing.

There is a principle I learned the hard way working on a disaster of a Java project decades ago: Only create an abstraction over a concrete type when you have 2 or more implementers of that abstract interface. To do otherwise is absolute madness - and it results in a complete explosion of complexity.

The main advantage of this philosophy is you need less code - and less code means fewer bugs and - perhaps most importantly - your code is easier to refactor. But in rust, using traits unnecessarily also means you end up with generic parameters everywhere. And that makes your code harder to write, harder to read, and much harder to refactor. It also slows compilation. (Dealing with structs that have 4+ generic parameters is also exhausting. Ask me how I know.)

In rust I only abstract over my type with a trait when I have 2 or more implementers, and I actually need to be generic in some way over them. Its only then that you can figure out what methods should go in the trait anyway. When you can use them, concrete structs are just better in every way. For example, I consider this good, idiomatic rust:

struct Foo {
    // ...
}

impl Foo {
    fn new() -> Self;
    fn blah(&self) -> usize { todo!() }
    fn do_baz(&mut self) { todo!() }
}

// Later...
struct Bar(Foo);

impl Bar {
    fn new() -> Self {
        Self(Foo::new())
    }
    // ..
}

Whereas the code below is strictly worse in every way. Its harder to write, harder to read, harder to refactor and slower to compile. It has no additional functionality:

struct Foo {
    // ...
}

impl Foo {
    fn new() -> Self { todo!() }
}

trait FooStuff {
    fn blah(&self) -> usize;
    fn do_baz(&mut self);
}

impl FooStuff for Foo {
    fn blah(&self) -> usize { todo!() }
    fn do_baz(&mut self) { todo!() }
}

struct Bar<F>(F);

impl<F: FooStuff> Bar<F> {
    fn new() -> Self {
        // Oh, dear. This function can't be implemented any more.
    }
    // ...
}

And not only is it more complex, but we also can't easily write our Bar::new() function like we had above.

Anyway, the original question was "Why was the compiler designed to reject this code?". The answer is that it is designed to reject anything that hasn't been implemented.

"Because it just hasn't been implemented yet" isn't a very satisfactory answer. And it doesn't look like RTN would solve this problem either.

I got a better answer by asking in the totally wrong place - on the (closed) RFC comment thread from compiler-errors on github:

The answer to your question is that this RFC describes named existential types in general, which really don't have too much to do with naming fn_item::Output since they can appear and be constrained in many more positions than simply a function's bare return type. I can write type Foo = impl Sized; fn bar() -> Box<Foo> { ... } which isn't afaict expressible without some additional syntax to project bar::Output to the type argument of Box.

Also, ::Output is not as flexible as it might initially seem, since the output for a function is worst-case a higher-ranked type (e.g. the type foo::Output when foo is fn foo(x: &i32) -> &i32). There are also complications with name resolution since foo exists in the value namespace, and associated type lookup happens in the type namespace, among probably other complications I am forgetting. This is all somewhat related to why #3654 ended up taking the shape it did.

2 Likes

And for a concrete example of how the generics approach falls down, consider this code:

fn some_iter() -> impl Iterator<Item = usize> {
    vec![1,2,3].into_iter()
}

struct SomeStruct<I> {
    iter: I,
}

impl<I: Iterator<Item = usize>> SomeStruct<I> {
    fn new() {
        Self {
            // ERROR: Return type of some_iter() does not match I.
            iter: some_iter()
        }
    }
}

This code generates a compiler error. We can make it work by constructing the iterator out-of-band and then passing it into the constructor, but that only works if the caller knows how to construct the iterator, and the iterator always exists before we create SomeStruct. I can't abide either of these constraints in my code, because I often need to construct & store references to child iterators dynamically.

This approach to code organization requires spooky action at a distance. And it quite severely restricts how the struct can be used.

2 Likes

I believe that dyn is underutilized in a lot of Rust code.

2 Likes

To avoid adding generics up the call chain, it seems like naming the iterator type may be a workaround.

I don't recall the link, but there was an article about the merits of duck typing, in that you define the concrete type once (in your struct) and the rest of the code "informally" operates as of it were just impl Iterator. You don't get the compiler to help you if you assume more than the trait defines, but this works for the "only have 1 concrete type" case.

This doesn't work well for deeply nested modules if you have to repeat the concrete type... but if the outer wrappers only use functions on the inner type, then it can limit the exposure.

To clarify, I got the same feeling of "Oh, this is less blanket useful than I assumed and more niche useful" when I discovered references are short-lived constructs. It seems like impl Trait is more for short-lived things (called function, then use iterator, not storing it for later).
Update: having issues on mobile editor

If this is an issue for you, you may have other bigger problems. I sincerely hope my characterization does not come across as rude or patronizing, but in this regard I agree with @drewtato. If you have such deeply nested structs I'd want to know why you need this kind of deep hierarchy.

Currently Rust tends to push users with these kinds of problems towards a few idiomatic approaches:

  • Generics everywhere
  • One-off concrete structs that each impl Iterator<Item=...>
  • Box<dyn Iterator<Item=...>> (underrated imo)
  • Collect into Vec<T> more often, instead of impl Iterator<Item=...>
  • Immutable style, i.e. track your "state" in functions, not in structs
  • Prefer IntoIterator over Iterator, and/or expose "getters" for any underlying slice-like iterables

I do hope one day we will gain the ability to name these kinds of unnamable types, but until that happens you are mostly limited to the above approaches.

2 Likes

@josephg, I know this is not a direct answer to your question, but I'm curious whether the above would work for you, or if you've ruled it out because of the extra allocation. It seems like it would work with your current design.

This is annoying but has the workaround of returning SomeStruct<impl Iterator> instead of Self. The real issue is that you can't implement Default.

Sorry, I didn't mean that. I think your organization style is good in principle, it's just not well supported by Rust when return-position impl trait is also being used.

2 Likes

Rust pushes users into those approaches because of a hole in the type system.

I have no problem with generics when they're appropriate. I use them throughout my code. But this type simply isn't generic. There is only one type I: Iterator<..> that is ever constructed.

I've considered using Box<dyn Iterator<...>> but that has significant performance implications. Sometimes I do collect my child iterators into Vecs - but again, thats not ideal in many cases for the same reason.

It has been asked a few times why custom iterators come up so much in my code.

The program in question is diamond-types, a library for realtime collaborative text editing. We recently published an academic article on the algorithm, and that article includes benchmarks comparing to other contemporary (and at this point, very well optimised) CRDT libraries.

Performance matters. In figure 10 of the paper, the high peak memory usage for eg-walker happens as a direct result of collecting one of my iterators into a vec.

The library stores & manipulates text editing events. Humans often type in sequential runs, for example:

  • Insert "T", position 100
  • Insert "e", position 101
  • Insert "x", position 102
  • Insert "t", position 103
  • ... and so on.

Editing events also have IDs (which are also mostly sequential) and "parent versions" - which again, are almost always sequential in practice. Storing & processing individual editing events one by one is much slower than dealing in runs "all at once". So we store all this data in memory (and on disk) in a run-length encoded format:

  • Insert "Text", position 100-104
  • Delete items 50-59
  • ... etc

But even this is a simplification. We actually use a struct-of-arrays set of formats, where each "array" stores a different field. (We have one struct storing the editing positions, another storing the IDs, another storing parent versions, and so on).

I wrote a blog post going into this optimisation (and many others) in more detail on this a few years ago.

This is very efficient, but as you might guess, I've ended up with a lot of specialised traits (generically) describing items which can be losslessly combined & split. And specialised collection types which can store & process internally run-length encoded data efficiently. Eg RleVec or a custom order-statistic tree. And some specialised combinators, like rle zip.

As you can imagine, custom iterators get produced and consumed all over the place. And I basically can't use any of the combinator functions (map, filter, fold, etc) because the produced iterator isn't as versatile as I usually need it to be. Performance matters (and so does wasm bundle size) - so I avoid Box<dyn Iterator<..>> and .collect<Vec<_>>
where I can. The poor ergonomics around impl Iterator, and the lack of coroutines (aka generators) make my code significantly more complicated than it needs to be.

Traits for splitting & merging RLE items are fabulous in rust. But iterators are, unfortunately, much easier to write in javascript because it doesn't have rust's limitations.

I generally agree with this sentiment and I share your frustration, don't get me wrong. But to provide a few minor nits: these limitations are not 100% accidental (I'll elaborate more on this below) and the issue is mostly ergonomic in nature rather than type-theoretic (the behavior you want can be expressed in Rust, but with a lot of ugly verbosity)

There are two unofficial driving principles of Rust (and somewhat shared by C and C++) that I think explain why this isn't entirely accidental, and why iterators are so awkward for this use-case:

  1. You should only need to know a function's signature in order to call it
  2. Explicit [monomorphization] is better than implicit

These two principles exist mostly to ensure that compile times don't blow up in surprising ways. If you combine them, along with the fact that Rust allows non-boxed closures, you can see why the generics become "infectious" in your use-case. Javascript doesn't run afoul of the same issue because principles 1 and 2 aren't really applicable. In fact this isn't unique to JS, any dynamically typed languages can avoid this issue, and any languages with implicitly boxed closures can also avoid this issue.

2 Likes

(the behavior you want can be expressed in Rust, but with a lot of ugly verbosity)

I don't think so - at least, not in the general case when working with functions that return impl Iterator. Eg, I often find myself trying to do something like this:

fn double(slice: &[u32]) -> impl Iterator<Item = u32> {
    // Actual functions are far more complicated.
    slice.iter().map(|i| i * 2)
}

struct Foo<'a> {
    list: &'a [u32],
    current_iter: double::Output, // Invalid.
}}}

impl<'a> Iterator for Foo<'a> {
    type Item = u32;
    fn next(&mut self) -> Option<Self::Item> {
        loop {
            if let Some(item) = self.current_iter.next()) {
                return Some(item)
            } else {
                self.current_iter = double(self.list);
            }
        }
    }
}

Even with generics I haven't figured out a way to do this - because we need to construct new iterators at runtime. Now that I'm thinking about it out loud, I guess I could create a custom trait that also defines a way to create iterators from whatever their constructor arguments are. Then hope that the type system is powerful enough to infer all the type parameters I can't name (like the return value of that function). But even in the best case, I think I'd end up with generic parameters everywhere.

My current solution is to replace the double function above with a custom, hand written Iterator object, just so I can name the type.

It would be much, much easier for me to write all of this code if I had a way to name the iterator object returned by functions like double.

Javascript doesn't run afoul of the same issue because principles 1 and 2 aren't really applicable. In fact this isn't unique to JS, any dynamically typed languages can avoid this issue, and any languages with implicitly boxed closures can also avoid this issue.

True. But:

  1. Heap allocation (via an allocator) is often far more expensive than most people think. Runtime garbage collectors (like you'll find in V8) be significantly faster than an allocator like jemalloc for short lived objects, because it can allocate out of temporary arenas. I've seen several examples of simple javascript code outperforming simple rust code when the rust code in question is overly reliant on heap allocation. To compete with javascript performance-wise, rust needs to avoid allocations. Javascript does not.
  2. Unlike rust, Javascript supports coroutines (via function*). Here's an example of the core of the Eg-walker algorithm in rust, implemented as an iterator. (The rest of this file contains utility functions - so arguably this entire file and some nearby files are all part of the iterator). This code is horrible to read & write, because I needed to manually invert the iterator, and build a struct out of my function variables and so on.

And as you can see in the code, my struct has this:

pub(crate) struct TransformedOpsIterRaw<'a> {
    // ...
    ops: &'a RleVec<KVPair<ListOpMetrics>>,
    op_iter: Option<BufferedIter<OpMetricsIter<'a>>>,
}

Then the iterator implementation uses that:

impl<'a> Iterator for TransformedOpsIterRaw<'a> {
    type Item = TransformedResultRaw;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(op_iter) = self.op_iter.as_mut() {
            if let Some(pair) = op_iter.next() {
                let (remainder, result) = Self::next_from(self.aa, &mut self.tracker, self.op_ctx, pair);
                if let Some(r) = remainder {
                    op_iter.push_back(r);
                }
                return Some(result);
            } else { self.op_iter = None; }
        }

        // ... Otherwise fall through to other logic.
        todo!()

        // And somewhere in a mess of logic is this:

        let mut op_iter = BufferedIter::new(OpMetricsIter::new(self.ops, self.op_ctx, *span));
        self.op_iter = Some(op_iter);
    }
}