Split float into integer and fraction for interpolation

I'm looking for a fast and maybe idiomatic way to split a floating point number into an integer and a fractional part. The integer part shall always be less or equal to the floating point number and the fractional part shall be within 0.0..1.0 (1.0 excluded).

This is typically needed for interpolating values of an array.

My ideas so far:
(Rust Playground)

use std::time::Instant;

#[inline(never)]
fn split_a(x: f64) -> (i64, f64) {
    let int = x.floor() as i64;
    (int, x - int as f64)
}

#[inline(never)]
fn split_b(x: f64) -> (i64, f64) {
    let int = x.floor();
    (int as i64, x - int)
}

#[inline(never)]
fn split_c(x: f64) -> (i64, f64) {
    (x.floor() as i64, x.rem_euclid(1.0))
}

#[inline(never)]
fn split_d(x: f64) -> (i64, f64) {
    let int = x as i64;
    (int, x - int as f64)
}

#[inline(never)]
fn split_e(x: f64) -> (i64, f64) {
    (x.trunc() as i64, x.fract())
}

#[inline(never)]
fn split_f(x: f64) -> (f64, f64) {
    let int = x.floor();
    (int, x - int)
}

#[inline(never)]
fn split_g(x: f64) -> (f64, f64) {
    (x.trunc(), x.fract())
}

fn main() {
    // trick the compiler into not pre-computing the results
    let zero = Instant::now().elapsed().as_secs() as f64;

    let test_values = [-1.0 + zero, -0.75 + zero, -0.5 + zero, -0.25 + zero, 0.0 + zero, 0.25 + zero, 0.5 + zero, 0.75 + zero, 1.0 + zero];

    println!("split_a: ok");
    for v in &test_values { print!("{:?} ", split_a(*v)); } println!();
    println!("split_b: ok");
    for v in &test_values { print!("{:?} ", split_b(*v)); } println!();
    println!("split_c: ok but -0.0 as fract for negative input");
    for v in &test_values { print!("{:?} ", split_c(*v)); } println!();
    println!("split_d: ok for non-negative");
    for v in &test_values { print!("{:?} ", split_d(*v)); } println!();
    println!("split_e: ok for non-negative");
    for v in &test_values { print!("{:?} ", split_e(*v)); } println!();
    println!("split_f: ok (integer as f64)");
    for v in &test_values { print!("{:?} ", split_f(*v)); } println!();
    println!("split_g: ok for non-negative (integer as f64)");
    for v in &test_values { print!("{:?} ", split_g(*v)); } println!();
}

output:

split_a: ok
(-1, 0.0) (-1, 0.25) (-1, 0.5) (-1, 0.75) (0, 0.0) (0, 0.25) (0, 0.5) (0, 0.75) (1, 0.0) 
split_b: ok
(-1, 0.0) (-1, 0.25) (-1, 0.5) (-1, 0.75) (0, 0.0) (0, 0.25) (0, 0.5) (0, 0.75) (1, 0.0) 
split_c: ok but -0.0 as fract for negative input
(-1, -0.0) (-1, 0.25) (-1, 0.5) (-1, 0.75) (0, 0.0) (0, 0.25) (0, 0.5) (0, 0.75) (1, 0.0) 
split_d: ok for non-negative
(-1, 0.0) (0, -0.75) (0, -0.5) (0, -0.25) (0, 0.0) (0, 0.25) (0, 0.5) (0, 0.75) (1, 0.0) 
split_e: ok for non-negative
(-1, 0.0) (0, -0.75) (0, -0.5) (0, -0.25) (0, 0.0) (0, 0.25) (0, 0.5) (0, 0.75) (1, 0.0) 
split_f: ok (integer as f64)
(-1.0, 0.0) (-1.0, 0.25) (-1.0, 0.5) (-1.0, 0.75) (0.0, 0.0) (0.0, 0.25) (0.0, 0.5) (0.0, 0.75) (1.0, 0.0) 
split_g: ok for non-negative (integer as f64)
(-1.0, 0.0) (-0.0, -0.75) (-0.0, -0.5) (-0.0, -0.25) (0.0, 0.0) (0.0, 0.25) (0.0, 0.5) (0.0, 0.75) (1.0, 0.0) 

So far I like split_b most. Any other ideas?

You can use godbolt to inspect the generated assembly.
Is not clear what you are looking for, the proposed functions yield different results for the same inputs. Why do you need to split an f64? Have you considered fixed point numbers?

1 Like

First of all, casting anything floating-point to i64 is probably not right, because the dynamic range of f64 exceeds that of i64.

Furthermore, since floating-point numbers can't even express non-integers after a certain point (around 2^53, I never remember exactly how many bits the significand has), it probably doesn't make sense to ask for the integer and fractional part of such large numbers, and basing your logic on having a sensible value for the fractional part will probably break something.

That said, if you can ensure you will always invoke the function with values of sufficiently low magnitude, a signature and implementation that would satisfy your requirements would be:

fn split_int_frac(x: f64) -> (f64, f64) {
    let i = x.floor();
    let f = x - i;
    (i, f)
}

If you call this with very large numbers, it will return (x, 0.0).

3 Likes

I had a look at the assembly but wasn't able to deduce the costs by the output. It looks like casting the f64 into a i64 is a very costly thing, but I think it's only true in debug builds. In a release built I was no longer able to recognized my code to learn from that.
I have several applications for this type of function. Some are only required to work in the positive range (allowing for methods to be taken into consideration), others are required to cover the negative range as well.
I'm using fix-point number where possible. But you make a good point. Converting the input into a fix-point number, split it and convert it back is another option (if the precision allows of it). hmm, there are even use cases where I can omit the last step and continue in integer space.

I'm aware of the limitations when it comes to converting floats into ints and vice versa. Should've mentioned that the input range lies within sane boundaries; sorry.

Your suggestion is my split_f(…). I often need the integer part to index into an array/slice but I'm not sure how expensive the conversion into a usize is.

I think I'll have to do some benchmarks to see how big of an issue this really is.

edit: so this isn't totally helpful (I need a slower pc):

running 7 tests
test tests::bench_a ... bench:           2 ns/iter (+/- 0)
test tests::bench_b ... bench:           2 ns/iter (+/- 0)
test tests::bench_c ... bench:          10 ns/iter (+/- 0)
test tests::bench_d ... bench:           1 ns/iter (+/- 0)
test tests::bench_e ... bench:           2 ns/iter (+/- 0)
test tests::bench_f ... bench:           2 ns/iter (+/- 0)
test tests::bench_g ... bench:           2 ns/iter (+/- 0)

rem_euclid isn't even as slow as I expected it to be. But It's interesting to see that a simple cast of f64 as i64 is faster than f64.floor(). But the former doesn't cover the negative range.

I added some variants with fix-point intermediate values. This reduces the available range and precision but leads to good performance even for the negative range:

#[inline(never)]
fn split_h(x: f64) -> (i64, f64) {
    let fix = (x * 4294967296.0) as i64;
    (fix >> 32, (fix & 0xffff_ffff) as f64 / 4294967296.0)
}

#[inline(never)]
fn split_i(x: f64) -> (i64, f64) {
    let fix = (x * 4294967296.0) as i64;
    (fix >> 32, (fix & 0xffff_ffff) as f64 * (1.0 / 4294967296.0))
}

#[inline(never)]
fn split_j(x: f64) -> (i64, f64) {
    let fix = (x * 4294967296.0) as i64;
    (fix >> 32, (fix & 0xffff_ffff) as f64 * 4294967296_f64.recip())
}

#[inline(never)]
fn split_k(x: f64) -> (i64, f64) {
    const FACTOR: f64 = (1_u128 << 64) as f64;
    let fix = (x * FACTOR) as i128;
    ((fix >> 64) as i64, (fix & 0xffff_ffff_ffff_ffff) as f64 * FACTOR.recip())
}

#[inline(never)]
fn split_l(x: f64) -> (i64, f64) {
    const FACTOR: f64 = (1_u128 << 64) as f64;
    const DIV: f64 = 1.0 / FACTOR;
    let fix = (x * FACTOR) as i128;
    ((fix >> 64) as i64, (fix & 0xffff_ffff_ffff_ffff) as f64 * DIV)
}

split_h through split_j use i32:u32 while split_k and split_l use i64:u64 (which is very slow).
(I wasn't sure how/if the division could get optimized by the compiler - hence the multiple versions)

running 12 tests
test tests::bench_a ... bench:           2 ns/iter (+/- 0)
test tests::bench_b ... bench:           2 ns/iter (+/- 0)
test tests::bench_c ... bench:           9 ns/iter (+/- 0)
test tests::bench_d ... bench:           1 ns/iter (+/- 0)
test tests::bench_e ... bench:           2 ns/iter (+/- 0)
test tests::bench_f ... bench:           2 ns/iter (+/- 0)
test tests::bench_g ... bench:           2 ns/iter (+/- 0)
test tests::bench_h ... bench:           1 ns/iter (+/- 0)
test tests::bench_i ... bench:           1 ns/iter (+/- 0)
test tests::bench_j ... bench:           1 ns/iter (+/- 0)
test tests::bench_k ... bench:          12 ns/iter (+/- 0)
test tests::bench_l ... bench:          12 ns/iter (+/- 0)

I believe this could be improved (for my use cases) if there was a way to convert a float into an int with wrapping arithmetics. But I didn't find a solution for that.

(Edit: I messed um my benches :man_facepalming: - this method actually IS faster than all others. I updated this post accordingly)

I wrote a version that extracts the integer and the fractional part directly from the float's bit pattern.

This method wraps the integer side on overflow (instead of saturating) and handles NaNs, Inf and subnormals in a deterministic way (returns (0, 0.0)). One could adapt this method for other precision and one can directly use the integer results without converting them back into floats.

The only downside I found is its size in assembly which might prevent some inlining. Also it's ugly as hell and not idiomatic in any way :man_shrugging:

pub fn split_n(x: f64) -> (i64, f64) {
    const FRACT_SCALE: f64 = 1.0 / (65536.0 * 65536.0 * 65536.0 * 65536.0); // 1_f64.exp(-64)
    const STORED_MANTISSA_DIGITS: u32 = f64::MANTISSA_DIGITS - 1;
    const STORED_MANTISSA_MASK: u64 = (1 << STORED_MANTISSA_DIGITS) - 1;
    const MANTISSA_MSB: u64 = 1 << STORED_MANTISSA_DIGITS;

    const EXPONENT_BITS: u32 = 64 - 1 - STORED_MANTISSA_DIGITS;
    const EXPONENT_MASK: u32 = (1 << EXPONENT_BITS) - 1;
    const EXPONENT_BIAS: i32 = (1 << (EXPONENT_BITS - 1)) - 1;

    let bits = x.to_bits();
    let is_negative = (bits as i64) < 0;
    let exponent = ((bits >> STORED_MANTISSA_DIGITS) as u32 & EXPONENT_MASK) as i32;

    let mantissa = (bits & STORED_MANTISSA_MASK) | MANTISSA_MSB;
    let mantissa = if is_negative { -(mantissa as i64) } else { mantissa as i64 };

    let shl = exponent + (64 - f64::MANTISSA_DIGITS as i32 - EXPONENT_BIAS + 1);
    if shl <= 0 {
        let shr = -shl;
        if shr < 64 { // x >> 0..64
            let fraction = ((mantissa as u64) >> shr) as f64 * FRACT_SCALE;
            (0, fraction)
        } else { // x >> 64..
            (0, 0.0)
        }
    } else {
        if shl < 64 { // x << 1..64
            let int = mantissa >> (64 - shl);
            let fraction = ((mantissa as u64) << shl) as f64 * FRACT_SCALE;
            (int, fraction)
        } else if shl < 128 { // x << 64..128
            let int = mantissa << (shl - 64);
            (int, 0.0)
        } else { // x << 128..
            (0, 0.0)
        }
    }
}

Benches are good:

test tests::bench_a ... bench:           2 ns/iter (+/- 0)
test tests::bench_b ... bench:           2 ns/iter (+/- 0)
test tests::bench_c ... bench:          10 ns/iter (+/- 1)
test tests::bench_d ... bench:           1 ns/iter (+/- 0)
test tests::bench_e ... bench:           2 ns/iter (+/- 0)
test tests::bench_f ... bench:           2 ns/iter (+/- 0)
test tests::bench_g ... bench:           2 ns/iter (+/- 0)
test tests::bench_h ... bench:           1 ns/iter (+/- 0)
test tests::bench_i ... bench:           1 ns/iter (+/- 0)
test tests::bench_j ... bench:           1 ns/iter (+/- 0)
test tests::bench_k ... bench:          12 ns/iter (+/- 0)
test tests::bench_l ... bench:          12 ns/iter (+/- 0)
test tests::bench_n ... bench:           1 ns/iter (+/- 0)   <<-- new bit-fiddling method

Here's the assembly for completeness:

.LCPI16_0:
	.long	1127219200
	.long	1160773632
	.long	0
	.long	0

.LCPI16_1:
	.quad	0x4330000000000000
	.quad	0x4530000000000000

.LCPI16_2:
	.quad	0x3bf0000000000000

playground::split_n:
	movq	rax, xmm0
	mov	rdi, rax
	shr	rdi, 52
	mov	ecx, edi
	and	ecx, 2047
	movabs	r8, 4503599627370495
	mov	rdx, rax
	and	rdx, r8
	lea	rsi, [rdx + r8]
	test	rax, rax
	lea	rax, [rdx + r8 + 1]
	not	rsi
	cmovns	rsi, rax
	cmp	ecx, 1011
	ja	.LBB16_5
	cmp	ecx, 947
	jbe	.LBB16_2
	mov	cl, 51
	sub	cl, dil
	shr	rsi, cl
	movq	xmm1, rsi
	punpckldq	xmm1, xmmword ptr [rip + .LCPI16_0]
	subpd	xmm1, xmmword ptr [rip + .LCPI16_1]
	movapd	xmm0, xmm1
	unpckhpd	xmm0, xmm1
	addsd	xmm0, xmm1
	mulsd	xmm0, qword ptr [rip + .LCPI16_2]
	xor	eax, eax
	ret

.LBB16_5:
	lea	rdx, [rdi + 13]
	cmp	ecx, 1075
	jae	.LBB16_6
	mov	cl, 51
	sub	cl, dil
	mov	rax, rsi
	sar	rax, cl
	mov	ecx, edx
	shl	rsi, cl
	movq	xmm1, rsi
	punpckldq	xmm1, xmmword ptr [rip + .LCPI16_0]
	subpd	xmm1, xmmword ptr [rip + .LCPI16_1]
	movapd	xmm0, xmm1
	unpckhpd	xmm0, xmm1
	addsd	xmm0, xmm1
	mulsd	xmm0, qword ptr [rip + .LCPI16_2]
	ret

.LBB16_2:
	pxor	xmm0, xmm0
	xor	eax, eax
	ret

.LBB16_6:
	pxor	xmm0, xmm0
	cmp	ecx, 1138
	ja	.LBB16_7
	mov	ecx, edx
	shl	rsi, cl
	mov	rax, rsi
	ret

.LBB16_7:
	xor	eax, eax
	ret

The conversion of the fraction back into floats could be done with bit-fiddling as well, but without a better resolution of the bencher I cannot see if this actually improves the performance.

You should benchmark over a bunch of values (when I try to optimize a function I usually just keep adding orders of magnitude starting from 1000 until I get sizeable numbers).

It's the same as converting to any other integer type. usize is not special, it's the same as u16 on 16-bit platforms, u32 on 32-bit platforms, and u64 on 64-bit platforms. The only reason usize is a separate type is that non-portable code is made explicit and it doesn't break unexpectedly when you compile on a target of different bit width. Other than that, it's the same kind of primitive integer type as its fixed-with relatives.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.