What's the best approach to dynamic types chosen at runtime? Box, or enum, or something else?

culebron · November 5, 2022, 12:13pm

Often times, I need to open a file with one of several reader structs. For instance, user may want to read .csv, .csv.gz, or .csv.bz2. In GIS-related work, there's a dozen of formats, and in real life I must support at least 5 of them (GeoPackage, CSV, Shapefile, FlatGeobuf, GeoJSON).

So, a function opening them must return a dynamic type or an enum wrapper.

Is there a better way?

Boxing seems shorter. But if I remember correctly, there are things you can't do with it.

fn open(path: std::path::Path) -> Result<Box<dyn Read>, MyErrorType> {
		let fp = std::fs::File::open(path)?;
		let rd = if path.ends_with(".csv.gz") {
			Box::new(GzDecoder::new(fp)) as Box<dyn Read + Send>
		} else if path.ends_with(".csv.bz2") {
			Box::new(BzDecoder::new(fp)) as Box<dyn Read + Send>
		} else if path.ends_with(".csv") {
			Box::new(fp) as Box<dyn Read + Send>
		} else {
			return Err("unsupported file extension".into());
		};
        Ok(rd)
}

Enum seems more "grounded", but I suspect this one will not have Copy or other necessary traits, for instance, to send into a thread.

enum ReaderWrapper {
    Plain(File),
    Gz(GzDecoder),
    Bz2(BzDecoder)
}

impl Read for ReaderWrapper {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize, IoError> {
        match self {
        // seems repetitive to me
        ReaderWrapper::Plain(mut ref f) => self.f.read(buf),
        ReaderWrapper::Gz(mut ref f) => self.f.read(buf),
        ReaderWrapper::Bz2(mut ref f) => self.f.read(buf)
    }
}

fn open(path: std::path::Path) -> Result<ReaderWrapper, MyErrorType> {
        let fp = std::fs::File::open(path)?;
		let rd = if path.ends_with(".csv.gz") {
			ReaderWrapper::Gz(GzDecoder::new(fp)))
		} else if path.ends_with(".csv.bz2") {
			ReaderWrapper::Bz2(BzDecoder::new(fp))
		} else if path.ends_with(".csv") {
			ReaderWrapperBox::new(fp) as Box<dyn Read + Send>
		} else {
			return Err("unsupported file extension".into());
		};
        Ok(rd)
}

Maybe there's a macro to define this enum automatically, with all impls? (like From<File>, From<GzDecoder> etc.

type MyEnum = enum![File, GzDecoder, BzDecoder];

...
   let rd:MyEnum = if path.endswith('.csv.gz') { GzDecoder::new(fp).into() }
       else if path.endswith('.csv.bz2') { BzDecoder::new(fp).into() }
       else { fp.into() };
   // .into call From<T> for MyEnum which is generated by the macro automatically

This is such a frequent case, causes so much pain, and yet I didn't see a doc on this.

What suprises me is that Rust went further with the idea from dynamic languages, like Python, but didn't get it to the end.

In Python community, there's an advice to check if class has a particular method (if hasattr(my_file_obj, iter): ...) instead of class per se (if isinstance(my_file_obj, GzipFile): ... ).

But it's not formalized in any way. Rust did formalize this in form of traits. But it works only at compile time. You can't return just an obj with a trait, e.g. fn (path: std::path::Path) -> dyn Read { ... }.

VorfeedCanal · November 5, 2022, 1:10pm

It doesn't surprise me. You quite literally need to move mountains for that to happen. Or, kinda, do equivalent amount of work. If you are interested in gory details you can read this blog post, but TL;RD version is: Rust haven't implemented it not because it's bad or desirable, but simply because it's such a huge amount of work it's not at all clear if there would ever be enough resources to do that.

Yes. Your best bet is to return Box<dyn Read>. That's what most language which pretend they can return dyn Read are doing, anyway. They just hide it.

Swift is the only non-esoteric exception which I know about. It shows that it's doable and maybe some day, years from now, Rust would add it… but not any time soon. It's just hard. Sorry.

H2CO3 · November 5, 2022, 1:35pm

Box isn't Copy either, and it's not even possible do declare a Box<dyn Trait + Copy>. It's basically exactly the opposite – you can trivially #[derive(Copy)] for an enum if it has all copiable variants. Auto traits such as Send and Sync are automatically implemented for enums just like any other composite type, when applicable. So I don't get what your concern is.

I'm not sure I understand what you are getting at. Rust is a statically typed language. But even in a dynamically typed language, you really have to reason about types (or interfaces, when it comes to your example w.r.t. checking the methods of a type).

That's exactly what Box<dyn Read> is for. You must heap-allocate it because it's dynamically sized, so it can only exist behind indirection. This is not some sort of artificial restriction Rust imposes; it's a technical necessity.

It's not like Python solves this problem, either. It just heap allocates everything.

Anyway, to answer your question in the title:

It really depends. Trait objects are good when you need to support behavior that's beyond your control, i.e. when there isn't a fixed set of types/behaviors, and/or they are not really related apart from the single trait you are relying on. On the other hand, enums are useful when there are few possible types, all known at compile time, and they are related in some reasonable way so that you can implement methods on the enum by matching on its variants.

arlecchino · November 5, 2022, 1:47pm

You are missing object inheritance.
This is something I also have to invest a lot of time with Rust to circumvent this with additional converters and wrappers.
Try out to use your own macro, it is not that difficult and reduces dependency hell.
And for 3 or a half dozen distinctions, I would go one of the ways you have described and don't make things more abstract as it needs to be.
Maybe define an own type to make it more readable.

VorfeedCanal · November 5, 2022, 2:49pm

I wouldn't call it technical necessity. C99 had support for dynamically-sized objects on stack last century. It's just quite problematic on some platforms and not entirely clear if it's really worth all the complexity this would bring to the compiler. Especially if there are desire to support use cases where it's not clear which type would be returned and thus impossible to allocate buffer in the calling function for it.

Basically: it's doable but it's not clear if it's desirable. Because all these complications and lots of copying of data around are not a good fit for embedded (where you may want to avoid Box) and not even clear performance win on other platforms.

Memory allocation is not free, but these manipulations with types of unknown sizes are not free either.

H2CO3 · November 5, 2022, 2:52pm

I.e., it's a "technical necessity" by a reasonable definition. It could be made work theoretically, but it's problematic, so it's not supported (yet – there are plans for the distant future for supporting by-value dynamically-sized objects).

I'm not trying to debate definitions pointlessly, neither do I want to assert that it's 100% impossible by the laws of the universe. I was merely trying to point out to OP that the status quo is not due to the Rust compiler being subpar or the language designers being evil; dyn Trait-by-value has very good reasons of not having been implemented for now.

tczajka · November 5, 2022, 8:50pm

It's the old conundrum: polymorphism (traits) vs algebraic data types (enum).

If you have N types of things, each of which implements M different functionalities, you can either slice your code by type (polymorphism), or you can slice it by functionality (algebraic data types).

The best solution depends on which coordinate is more "open-ended".

If you want to be able to easily add more types, then polymorphism is better.
If you want to be able to easily add more functionality, then algebraic data types are better.

In this particular case it looks like there is small fixed functionality (Read), and there could potentially be many different types added (different compression schemes). So this points to the polymoprhism approach: use Box<dyn Read>.

H2CO3 · November 5, 2022, 10:11pm

Obligatory link to The AST Typing Problem.

CAD97 · November 6, 2022, 2:12am

Topic		Replies	Views
Runtime dynamic types help	5	441	July 14, 2021
What do I do instead of storing the same base type with two different generics inside of a single variable, it this makes any sense help	5	221	January 9, 2024
Result Type: when to Box vs. when to Wrap? help	3	1075	July 28, 2021
Dynamic Futures Uncategorized	2	1078	January 22, 2020
Dynamic Typing in Rust help	3	574	March 31, 2022

What's the best approach to dynamic types chosen at runtime? Box, or enum, or something else?

Related topics