C-Style Sparse Enum With Default Variant?


#1

Is it possible to somehow add a ‘default’ variant to a c-style enum? What I mean, is that I want some thing like:

    #[derive(PartialEq, Eq, Debug)]
    #[repr(u16)]
    pub enum Number {
        IfMatch = 1,
        UriHost = 3,
        ETag = 4,
        IfNoneMatch = 5,
        Unknown(u16)
    }

So that I can effectively use the variable as a u16, but also be able to use the named parameters if they’re available.

I can currently do an unsafe { mem::transmute(2) } and put any u16 into num, but I’m not sure if that’s going to cause me problems beyond not being able to print out the proper debug info. (Right now any number that isn’t declared in the enum is just printed as the first variant.)


#2

Your current approach isn’t going to do what you want. You shouldn’t be using transmute unless you understand exactly what is going on under the hood. You can’t use enum for this; just use constants instead:

type Number = u16;
const IF_MATCH: Number = 1;
const URI_HOST: Number = 3;
// ...

Or, if you want stronger typing:

#[derive(PartialEq, Eq, Debug)]
struct Number(pub u16);
const IF_MATCH: Number = Number(1);
const URI_HOST: Number = Number(3);
// ...

If you’re willing to stick to a nightly compiler, you can also use associated consts to allow things like Number::IF_MATCH, and you can override the style lints if you want PascalCase names.


#3

Darn, I was hoping that wasn’t going to be the answer.

Thanks for the tip on associated consts. I’ll have to see if that’s the way I want to go or if I just want to use an non-wire-compatible representation.

Out of curiosity, what problems am I going to run into with using transmute into a c-style enum? I’ve been playing with it and it seems to be working fine with the exception of how it’s printed as debug.


#4

Because it’s not a u16. To be honest, I don’t know what that does in this situation, but you can’t avoid the fact that Rust needs both a tag and space for the u16 value. Actually…

> cargo script -e '#[repr(u16)] enum Number { IfMatch, Other(u16) } std::mem::size_of::<Number>()'
   Compiling expr v0.1.0 (file:///C:/Users/drk/AppData/Local/Cargo/script-cache/expr-187aaf8ba42cf82b)
(warnings)
4

Yeah, so you end up having a u16 for the tag, plus another u16 for the Unknown variant, assuming it’s being stored. That transmute probably only works because it’s assuming you meant to transmute an i32, which is the same size, but not at all the same structure.

So, yeah; don’t touch unsafe unless you actually, really, seriously know what’s going on. I’ve seen quite a few people comprehensively blow their own legs off because they took a guess and it didn’t immediately explode. :slight_smile:


#5

Oh, sorry, I was meaning without the last line of the example in my first post. Sorry for being unclear. I was offering that as something that I was hoping I could do that rust would optimize like it does with Option and null pointers.

If I have,

    #[repr(u16)]
    pub enum Number {
        ReservedOrUnknown = 0,
        IfMatch = 1,
        UriHost = 3,
        ETag = 4,
        IfNoneMatch = 5,
        Observe = 6,
    }

that is represented by a u16, right? My understanding is that that’s what #[repr(u16)] is supposed to do.


#6

Yes, it will use an u16 for the tag, but you still can’t transmute random values into that type. Rust requires that the enum never contain any unexpected values, or the code it generates may or may not break.


#7

The need for an enumeration where few values have a name and are special, and all the other values are just numeric values is very common, I have encountered it already few times in my Rust coding (and I think Ada language enums allow you to handle it).

This is a case I’ve found few days ago, this is a solution for the Euler Problem 14 (https://projecteuler.net/problem=14 ):

fn collatz_chain_len_direct(n: usize, cache: &mut [Option<u32>]) -> u32 {
    match n {
        1 => 1,
        n if n % 2 == 0 => collatz_chain_len(n / 2, cache) + 1,
        _ => collatz_chain_len(3 * n + 1, cache) + 1
    }
}

fn collatz_chain_len(n: usize, cache: &mut [Option<u32>]) -> u32 {
    match cache.get(n) {
        None => collatz_chain_len_direct(n, cache), // Caching not available.
        Some(&None) => { // Missing in cache.
            let len = collatz_chain_len_direct(n, cache);
            cache[n] = Some(len);
            len
        },
        Some(&Some(len)) => len
    }
}

fn main() {
    const LIMIT: usize = 1_000_000;
    const CACHE_SIZE: usize = LIMIT;
    let mut cache = vec![None; CACHE_SIZE];

    let max_index = (1 .. LIMIT)
                    .max_by_key(|&i| collatz_chain_len(i, &mut cache));
    println!("{:?}", max_index);
}

“cache” is a vector of Option, and it’s used for memoization, to avoid many re-computations, where “None” represents a not yet computed value.

If you replace the Option with a u32, in this case you can use 0 to represent None (because the chain lengths are always >= 1):

const NONE: u32 = 0;

fn collatz_chain_len_direct(n: usize, cache: &mut [u32]) -> u32 {
    match n {
        1 => 1,
        n if n % 2 == 0 => collatz_chain_len(n / 2, cache) + 1,
        _ => collatz_chain_len(3 * n + 1, cache) + 1
    }
}

fn collatz_chain_len(n: usize, cache: &mut [u32]) -> u32 {
    match cache.get(n) {
        None => collatz_chain_len_direct(n, cache), // Caching not available.
        Some(&NONE) => { // Missing in cache.
            let len = collatz_chain_len_direct(n, cache);
            debug_assert!(len != NONE);
            cache[n] = len;
            len
        },
        Some(&len) => len
    }
}

fn main() {
    const LIMIT: usize = 1_000_000;
    const CACHE_SIZE: usize = LIMIT;
    let mut cache = vec![NONE; CACHE_SIZE];

    let max_index = (1 .. LIMIT)
                    .max_by_key(|&i| collatz_chain_len(i, &mut cache))
                    .unwrap();

    println!("{}", max_index);
}

This version of the program is faster, you can see it even better if you set LIMIT=10Millions (perhaps because cache uses half the memory, and this improves CPU cache coherence), but it’s less elegant, and probably less safe too (in this program I have added a debug_assert, but in other similar programs if you forget to add it, you may sometimes write the NONE value by mistake).

In a well designed system language I’d like to keep the cake and eat it too, as they say. This means having a way to represent Option with just 32 bits, with a special value chosen by me, keep the whole code simpler, safer with appropriate debug_asserts added by the compiler where you insert something in the variables, with no need for global constants like NONE, and so on, and have the faster code.

Rust being a system language, the desire to manually manage enums and tags comes out often. There are vague proposals to user-manage manually tags and contents, to solve your problem. This topic was discussed recently for the “union structs” RFC, but I don’t remember the ergonomy of those proposals.