How can you inject functions and closures of different types into a fixed chain of iterator methods?
The broad outline looks something like this:
The user specifies an arbitrary number of files from which data should be read.
Data should be extracted from these files, but the user can specify, at runtime, which kind of data are to be extracted.
These data should be combined into one continuous stream, comprising the data read from all the files.
The data should be grouped according to a strategy that is appropriate for the choice made in step 2.
These groups should be processed according to a strategy that matches the choice in step 2.
let(read_strategy, group_strategy, processing_strategy) = get_strategies_from(cli_args); cli_args.infiles.iter() .flat_map(read_strategy) .group_by(group_strategy).into_iter() .map(processing_strategy) .collect::<Vec<_>>();
I'm omitting details (mostly intermediate steps which update progress bars and gather statistics) which complicate my real code, but I hope that these are not directly pertinent here.
I include some working, self-contained Python code which demonstrates what I'm trying to achieve, below.
The different read-strategies will, in general, extract different types. This pushes me towards dynamic dispatch. The strategies depend on other data supplied at run-time, and I'm finding myself needing to use closures in order to get the strategies to capture these data, but I'm struggling to
- make polymorphic iterator-consuming/returning closures,
- declare the polymorphic types of
- get the compiler to accept their combination.
I'm trying to use
impl Iterator<Item = T> but get lots of
`impl Trait` not allowed outside of function and inherent method return types
from itertools import groupby, chain from collections import namedtuple # Python appears to have no standard flat_map def flat_map(func, *iterable): return chain.from_iterable(map(func, *iterable)) # Imitate Rust's collect collect = tuple # ====================================================================== # Imagine we have an arbitrary number of files from which we want to read some # data. We represent them in-memory as a dictionary of file-name => contents input_files = dict(file1 = range(100), file2 = range(100, 200), file3 = range(200, 300)) # In real life, the user would supply these on the CLI filenames = tuple(input_files) # ====================================================================== # There are different types of information we might extract from the file. # Let's represent these by two different types: Foo = namedtuple('Foo', 'f') Bar = namedtuple('Bar', 'b') # Alternative strategies for extracting data from the file: either Foos or Bars def extract_foos_from_file(filename): return map(Foo, input_files[filename]) def extract_bars_from_file(filename): return map(Bar, input_files[filename]) # Strategies for grouping the data together, depending on what we extracted def group_foos(f): return f.f // 10 def group_bars(b): return b.b // 10 # Strategies for processing the information extracted from the files def process_foo_group(k_g): return sum(foo.f for foo in k_g) def process_bar_group(k_g): return sum(bar.b for bar in k_g) # ====================================================================== # The broad outline of how the data are processed remains the same, but the # details pertaining to what data should be extracted, how they should be # grouped, and how they should be processed can vary. for extract, group, process in ((extract_foos_from_file, group_foos, process_foo_group), (extract_bars_from_file, group_bars, process_bar_group)): # This sequence of operations (which I'm trying to express in Rust by # chaining iterator functions) is invariant: the difference lies in which # versions of `extract`, `group` and `process` are to be used. data = flat_map(extract, filenames) grouped = groupby(data, group) processed = map(process, grouped) result = collect(processed) print(result)