Is compile-time split of &str possible?

I'm creating a macro that does include_bytes! of a list of numbers separated by space:

1 2 3 4 5

I want to create a slice of u64 of those numbers, in compile time. I tried converting to a string and then calling split(" ") but split iterates over it which is not const.

So, is it possible to convert the bytes from the file to a list of u64 in compile time?

Cross-posting police :police_car::

6 Likes

Yes, but you have to split it manually.

1 Like

How do you parse a &str to u64 in a const context/function? FromStr is not const. Are there crates to do this?

const fn to_u64(s: &str) -> u64 {
    let mut res = 0;
    let mut i = 0;
    while i < s.len() {
        let b = s.as_bytes()[i];
        res = 10*res + (b - b'0') as u64;
        i += 1;
    }
    res
}
4 Likes

what do you mean by split manually?

When implementing manual splitting I struggle with two aspects (I) const slice indexing is not stable (so we need nightly) and (II) I'm unable to get a return type that is not an array, because Vec does not implement const indexing at all and slice.to_vec() is not const either (which would've been the easiest solution I think). This is inconvenient, because we don't know how many elements we will have after the split beforehand, to allocate the right amount of elements for the array. The following solution only supports parsing a string containing exactly five elements, which is not very useful at all.

Here is what I came up with (I took the parsing function from @alice), maybe someone can help me with fixing the issue with returning an array instead of a Vec:

const fn to_u64(bytes: &[u8], start: usize, end: usize) -> u64 {
    let mut res: u64 = 0;
    let mut i = start;
    
    while i < end {
        res = 10 * res + (bytes[i] - b'0') as u64;
        
        i += 1;    
    }
    
    res
}

const fn split_parse(bytes: &[u8]) -> [u64; 5] {
    let mut res = [0; 5];
    let mut idx_start = 0;
    let mut idx_end = 0;
    let mut idx_number = 0;
    
    while idx_end < bytes.len() {
        if bytes[idx_end] == b' ' {
            res[idx_number] = to_u64(bytes, idx_start, idx_end);
            
            idx_start = idx_end + 1;
            idx_number += 1;
        }
        
        idx_end += 1;
    }

    res[idx_number] = to_u64(bytes, idx_start, idx_end);

    res
}

fn main() {
    assert_eq!(split_parse(b"1 2 3 4 5"), [1, 2, 3, 4, 5]);
    assert_eq!(split_parse(b"1 2 30 4 66"), [1, 2, 30, 4, 66]);
}

Playground.

const fn to_u64(bytes: &[u8], start: usize, end: usize) -> u64 {
    let mut res: u64 = 0;
    let mut i = start;
    
    while i < end {
        res = 10 * res + (bytes[i] - b'0') as u64;
        
        i += 1;    
    }
    
    res
}

const fn split_parse<const LEN: usize>(bytes: &[u8]) -> [u64; LEN] {
    let mut res = [0; LEN];
    
    let mut idx_start = 0;
    let mut idx_curr = 0;
    let mut i = 0;

    while i < LEN {
        while idx_curr < bytes.len() && bytes[idx_curr] != b' ' {
            idx_curr += 1;
        }
        res[i] = to_u64(bytes, idx_start, idx_curr);
        idx_curr += 1;
        idx_start = idx_curr;
        i += 1;
    }

    res
}

fn main() {
    assert_eq!(split_parse(b"1 2 3 4 5"), [1, 2, 3, 4, 5]);
    assert_eq!(split_parse(b"1 2 30 4 66"), [1, 2, 30, 4, 66]);
}
3 Likes

You can also do this:

const fn split_len(bytes: &[u8]) -> usize {
    let mut len = 1;
    let mut i = 0;
    while i < bytes.len() {
        if bytes[i] == b' ' {
            len += 1;
        }
        i += 1;
    }
    len
}

const DATA: &[u8] = b"1 2 3 4 5";
const DATA_LEN: usize = split_len(DATA);
static DATA_INTS: [u64; DATA_LEN] = split_parse(DATA);

fn main() {
    println!("{:?}", DATA_INTS);
}

playground

6 Likes

That is very cool, thanks for sharing. Constant evaluation in Rust has become quite powerful in recent years. Though I'd really love to see the allocator apis becoming const, this would be really helpful IMO.

I might be misunderstanding what you meant, but I don't see how the allocator APIs could be used at compile time, since the heap does not exist until run time. Or do you mean that you would like to temporarily allocate something on the compiler's heap, for the purposes of a compile-time computation?

I don't think the proposal I linked above would use the compiler's heap technically. Instead, miri would create its own heap-like structure to allocate into. From the heap allocations in constants proposal:

Instead the miri-engine runs const eval specific code for producing an allocation that "counts as heap" during const eval, but if it ends up in the final constant, it becomes an unnamed static. If it is leaked without any leftover references to it, the value simply disappears after const eval is finished. If the value is deallocated, the call to dealloc in intercepted and the miri engine removes the allocation. Pointers to dead allocations will cause a const eval error if they end up in the final constant.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.