Char::to_digit musings

Hi, this is the source of char::to_digit():

pub fn to_digit(self, radix: u32) -> Option<u32> {
    assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");

    // the code is split up here to improve execution speed for cases where
    // the `radix` is constant and 10 or smaller
    let val = if radix <= 10 {
        match self {
            '0'..='9' => self as u32 - '0' as u32,
            _ => return None,
        }
    } else {
        match self {
            '0'..='9' => self as u32 - '0' as u32,
            'a'..='z' => self as u32 - 'a' as u32 + 10,
            'A'..='Z' => self as u32 - 'A' as u32 + 10,
            _ => return None,
        }
    };

    if val < radix { Some(val) } else { None }
}

I'd like to know why it isn't more like this:

pub fn to_digit(self, radix: u8) -> Option<u8> {
    if radix > 36 { return None; }

    // the code is split up here to improve execution speed for cases where
    // the `radix` is constant and 10 or smaller
    let val = if radix <= 10 {
        match self {
            '0' ..= '9' => self as u8 - b'0',
            _ => return None,
        }
    } else {
        match self {
            '0' ..= '9' => self as u8 - b'0',
            'a' ..= 'z' => self as u8 - b'a' + 10,
            'A' ..= 'Z' => self as u8 - b'A' + 10,
            _ => return None,
        }
    };

    if val < radix { Some(val) } else { None }
}

That can be used as:

fn main() {
    println!("{:?}", '8'.to_digit(10));
    let a = [10, 20];
    if let Some(d) = '1'.to_digit2(10) {
        println!("{:?}", a[usize::from(d)]);
    }
}

There are few differences:

  • Instead of a panic, it returns None if radix > 36.
  • It contains less "as" casts.
  • It returns an optional u8, this is handy because you can convert it safely and losslessly (avoiding "as" in user code) using ::from() without "as" casts to usize, u32 and some other types.

What do you think?

Shouldn't to_digit2() be try_to_digit()?

(I've renamed to_digit2 as to_digit). I don't see why you suggest a different name. But anyway, I was asking about semantics/API.

My mistake. I was still half-asleep and thought that you had switched the signature from returning an int to returning an Option<int>. That refactoring transformation is usually accomplished by prefixing try_ to the name of the function that panics.

That seems like a bad idea, because passing a radix greater than 36 to to_digit is a mistake you should want to be warned of as soon as possible.

I'd guess it's very rare to use to_digit with a non-constant radix. 99% of the time it will be either 10 or 16, probably 99% of the rest of the time it will be 8, 12, 20, or some other constant value. In my ideal world I'd want to have a compile-time failure for writing char::to_digit(c, 37), but since Rust doesn't support that at the moment, panicking is the next best thing.

You have a point here. Converting to i32 is probably pretty common and is awkward. It's possible that the API designers were thinking of eventually supporting larger radices that could handle larger digits, such as 'ↂ' (U+2182 ROMAN NUMERAL TEN THOUSAND) which would not fit in a u8. Of course it's also possible this is simply an oversight.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.