Unchecked cast u128 to u64 without unsafe?

I'm trying to do

let r: u128 = ...;
u64::try_from(r << 64).unwrap();

but without unwrapping which would be slow. I don't care if the value does not fit u64, I need C++ like static_cast for integer types only, but without unsafe code.

Actually, I need this:

    let a: u64 = ...;
    let b: u64 = ...;
    let r = a.into() as u128 * b.into() as u128;
    result[0] = u64::try_from(r & 0xFFFFFFFFFFFFFFFF0000000000000000).unwrap();
    result[1] = u64::try_from(r << 64).unwrap();

that is, I need the lower and higher u64 numbers of the u128. Something like to_be_bytes but for u64

You mean >> not <<.

It's not slow. The compiler is smart enough to optimize it.

See the generated assembly:

1 Like

why >>?

You also want 0x0000000000000000FFFFFFFFFFFFFFFF, not 0xFFFFFFFFFFFFFFFF0000000000000000.

Just like in decimal notation, high order bits are on the left side of the number, so you want to shift them right, not further left.

Edit: I think having low-order digits on the left of the number would be more consistent with left-to-right writing systems, but ultimately these are Arabic numerals, and Arabic language is written right-to-left, so maybe that's why we write numbers this way.

My implementation round-trips through u128::to_ne_bytes() and u64::from_ne_bytes().

fn u128_to_u64s(n: u128) -> [u64; 2] {
    let bytes = n.to_ne_bytes();
    let (mut high, mut low) = bytes.split_at(8);

    if cfg!(target_endian = "little") {
        std::mem::swap(&mut high, &mut low);
    }

    [
        u64::from_ne_bytes(high.try_into().unwrap()),
        u64::from_ne_bytes(low.try_into().unwrap()),
    ]
}

(playground)

It looks pretty verbose and the unwrap() call sets off alarm bells, but the optimiser is able to see this code will never fail and converts it into a no-op.

playground::u128_to_u64s:
	movl	$56088, %edx
	xorl	%eax, %eax
	retq

I added a transmute() version, but the two compiled to exactly the same assembly so the compiler ended up jumping to the playground::u128_to_u64s symbol twice.

1 Like

You can just use a as u64 and (a >> 64) as u64.

5 Likes

You can just use to_le_bytes or to_be_bytes rather than to_ne_bytes, and avoid the platform-dependent code at no cost. The optimizer is smart enough:

pub fn split_le(n: u128) -> (u64, u64) {
    let bytes = n.to_le_bytes();
    let (low, high) = bytes.split_at(8);
    (
        u64::from_le_bytes(low.try_into().unwrap()),
        u64::from_le_bytes(high.try_into().unwrap()),
    )
}

pub fn split_be(n: u128) -> (u64, u64) {
    let bytes = n.to_be_bytes();
    let (high, low) = bytes.split_at(8);
    (
        u64::from_be_bytes(low.try_into().unwrap()),
        u64::from_be_bytes(high.try_into().unwrap()),
    )
}

Also you can do:

(n as u64, (n >> 64) as u64)

I dislike as though, prefer explicit try_from, unwrap as in the original post. But in this case as does exactly what you want for the lower half without having to use a mask.

All versions generate the same exact code.

2 Likes

Note that if you're tempted to branch on endianness, you probably want the le or be versions of the functions instead, perhaps like this:

#[inline(never)]
fn u128_to_u64s(n: u128) -> [u64; 2] {
    let bytes = n.to_be_bytes();
    let (high, low) = bytes.split_at(8);

    [
        u64::from_be_bytes(high.try_into().unwrap()),
        u64::from_be_bytes(low.try_into().unwrap()),
    ]
}

As that also compiles down to something trivial:

playground::u128_to_u64s:
	mov	rax, rsi
	mov	rdx, rdi
	ret

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=47aac3b34ccfbee657362aac9d9a2916

(Though of course for the specific "I just want the high or low bits" use, shifting is best.)

EDIT: Doh, tczajka beat me.

Since the order a number is written corresponds to the order you say the parts of the numbers, I think the order makes sense as-is. You say 321 as “3 hundred 2enty 1”.

Also, the “Arabic” numeral system comes from India (and they write left-to-right AFAIK); it’s only been adopted by Arabic mathematicians, and from there came to Europe who thus called them “Arabic”.

Yeah, makes sense, English is big-endian. I would be tempted to argue this might be because of notation, but actually Roman numerals that were used before are also big-endian, even though latin isn't right-to-left.

German uses a hybrid mixed-endian system! 123 in German is "one hundred three twenty". Even worse than big-endian!

No need to tell me about the German way of saying numbers :stuck_out_tongue_winking_eye:

(pay attention to the location)

Edit: Actually, I just noticed, I’ve set my location in this forum as-well, just click on my name.


More accurately 123 in German is “one hundred three-and-twenty”. The “and” is somewhat relevant IMO.

I'd say it's very important. Not only does it make it more familiar to English speakers from blackbirds baked in a pie, but it also helps separate it from the French quatre-vingt ("four-twenties") meaning eighty.

I find particularly fascinating that - apparently - in Latin both versions are possible. For example 21 can be "viginti unus" or "unus et viginti".

Also I just noticed that English “-teen”s also kind-of say the number backwards, assuming that the “teen” is just an alternate form of “ten”. (And this assumption is based on the fact that German ”-teen“ numbers are formed the same way, but they use literally the same word as the word for 10 as their suffix.)

2 Likes

I was taught that for native Latin speaker the ordering of words would have been purely a matter of emphase.

For example, "viginti unus" would emphase the twenty, and "unus et viginti" would emphase the one, as in "a troop of twenty and one" the one being the captain.

The order of words in a Latin or ancient Greek sentence has next to no importance, at least the way I was taught.

More fascinating to me is that in German, despite also having declensions, the order of words is important.

1 Like

I suppose order of words within numbers and order of parts of a sentence are different things. Regarding order of words in a sentence, German is less strict than English but more strict than Latin. For example, adjectives must come before their corresponding nouns in German, while they don’t need to be anywhere close in Latin IIRC. However English requires a particular order of subject, verb, and object in a sentence, and also doesn’t allow anything unrelated in-between those, while German only prescribes the position of the verb in a sentence, while anything else can (usually) go pretty much anywhere (often with the effect of emphasizing different parts of the sentence); order between subject and object is only relevant in cases where declension is ambiguous.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.