Transfer C code to rust code

Hi.

As I'm new to rust I have some basic questions and hope that you have some patience with me.

I try to create a SPOA ( Stream Processing Offload Agent to handle HAProxy offloading possibilities like running some WASM workload.

The SPOP ( Stream Processing Offload Protocol ) is based on the peers protocol which uses Encoded Integer and Bitfield.

The Variable-length integer (varint) are handled via shift operators in C which is something is not similar available in rust, as far as I have read in different answers .

Don't get my wrong, I don't expect that you do any work for me but help me to understand how to transfer a C code paradigm to rust code paradigm. :smile:

Now let's make some concrete examples.

// from https://doc.rust-lang.org/book/ch03-02-data-types.html
// uint64_t => u64 in rust
// char ** => &String in rust
// char * => u8 in rust
// possible rust function declartion 
// fn encode_varint (i: u64, buf: &String, end: u8) -> i8 { .. }

static inline int encode_varint(uint64_t i, char **buf, char *end)
{
	unsigned char *p = (unsigned char *)*buf;
	int r;

	if (p >= (unsigned char *)end)
		return -1;

	if (i < 240) {
		*p++ = i;
		*buf = (char *)p;
		return 1;
	}

	*p++ = (unsigned char)i | 240;
	i = (i - 240) >> 4; // <= How can this be done in rust?

	while (i >= 128) {
		if (p >= (unsigned char *)end)
			return -1;
		*p++ = (unsigned char)i | 128; // <= How can this be done in rust?
		i = (i - 128) >> 7;                      // <= How can this be done in rust?
	}

	if (p >= (unsigned char *)end)
		return -1;
	*p++ = (unsigned char)i;

	r    = ((char *)p - *buf);
	*buf = (char *)p;
	return r;
}

Thank you for any help

1 Like

Shift operators exist in Rust and work almost identically (not counting precedence and the fact that in Rust, signed overflow is not UB).

// char ** => &String in rust
// char * => u8 in rust

I don't know where you are getting these from, but it's not even approximately accurate. The C function is mutating the passed-in pointer (hence, a pointer passed by pointer), so it would at least need to be a &mut String, but that's actually not necessary in Rust, as you can just return anything by value, including a String (unless you want keep a single buffer during the whole encoding procedure). But what you get from this encoding is definitely not going to be valid UTF-8, so you should be using a Vec<u8> rather than a String anyway.

And a char * is definitely not a u8, it's more like a *const u8 or *mut u8 (or i8, depending on the signedness of C's char).

All in all, here's a 16-line idiomatic re-implementation (with more suggestive constants):

fn encode_varint(mut i: u64, buf: &mut Vec<u8>) {
    if i < 0xf0 {
        buf.push(i as u8);
        return;
    }
    
    buf.push(i as u8 | 0xf0);
    i = (i - 0xf0) >> 4;
    
    while i >= 0x80 {
        buf.push(i as u8 | 0x80);
        i = (i - 0x80) >> 7;
    }
    
    buf.push(i as u8);
}

This never fails, so the negative return value has no equivalent, and the non-negative return values' equivalent is the difference between the length of the buffer before and after encoding.

4 Likes

See here for a list of arithmetic and bitwise operators in Rust.

1 Like

Maybe should add that adding/substracting unsigned numbers is an error on overflow in Rust (which may wrap or cause a panic), while in C it wraps by default, right?

Wow thank you for your time and solution, this is a really good start point for me.
I will now dig deeper into rust with such a great help.

1 Like

I have now tried to implement the decode_varint and it looks quite good for a rookie :slight_smile: thanks to the help of @H2CO3

I struggle now with some small issue that I don't get the full decoded value back.

my-implemantion

fn decode_varint(i: &mut u64, buf: &mut Vec<u8>) {
    if i < &mut 0xf0 {
        *i = buf.pop().unwrap() as u64;
        return;
    }
    
    let mut r = 4;
    loop {
        *i = buf.pop().unwrap() as u64;
        *i += *i << r;
        r +=7;
        //dbg!(r,i);
        if i <= &mut 0x80 {
            *i = buf.pop().unwrap() as u64;
            break;
        }
    }
}

That's the output where you can see that the ret value is not what it should be.

[src/main.rs:40] &buf = [
    239,
]
// first run with small value works
[src/main.rs:44] "239" = "239"
[src/main.rs:44] ret = 239

[src/main.rs:48] &buf = [
    240,
    0,
]
// now I get 0 instead of 240
[src/main.rs:52] "240" = "240"
[src/main.rs:52] ret = 0

[src/main.rs:59] "1337" = "1337"
[src/main.rs:59] ret = 68
[src/main.rs:64] &buf = [
    240,
    149,
    59,
]
[src/main.rs:67] "123456" = "123456"
[src/main.rs:67] ret = 59

That's the c code.

static inline int decode_varint(char **buf, char *end, uint64_t *i)
{
	unsigned char *p = (unsigned char *)*buf;
	int r;

	if (p >= (unsigned char *)end)
		return -1;

	*i = *p++;
	if (*i < 240) {
		*buf = (char *)p;
		return 1;
	}

	r = 4;
	do {
		if (p >= (unsigned char *)end)
			return -1;
		*i += (uint64_t)*p << r;
		r  += 7;
	} while (*p++ >= 128);

	r    = ((char *)p - *buf);
	*buf = (char *)p;
	return r;
}

pop() removes from the end of the vector, not from the beginning. Thus the order of bytes your decoder sees is wrong. Furthermore, you are comparing the already-decoded partial result to 128, whereas you should be comparing the next byte. Your partial result is never going to be smaller than 128, because it started at a number ≥ 240, and only got shifted to the left, so that's clearly nonsensical.

You also seem to be doing a lot of stuff that wasn't in the original code; why?

By the way, in case I wasn't clear in my previous post: don't write Rust by imitating C. Rust is a different, more sophisticated language with its own idioms, and empirically, most C code is ugly and not very well-written anyway, the above piece being no exception.

Accordingly, you don't need any of that funky pointer dance. Why don't you just return the parsed number by value? (I don't get why the C code doesn't do that, either.)

Here's a simpler and correct re-implementation that also checks for the end of the buffer instead of just assuming that it's large enough:

fn decode_varint(buf: &[u8]) -> Option<(u64, &[u8])> {
    let (&head, mut rest) = buf.split_first()?;
    let mut x = u64::from(head);
    
    if x < 0xf0 {
        return Some((x, rest));
    }

    let mut r = 4;
    loop {
        let (&byte, tail) = rest.split_first()?;
        rest = tail;
        
        x += u64::from(byte) << r;
        r += 7;
        
        if byte <= 0x80 {
            break;
        }
    }
    
    Some((x, rest))
}

Playground

3 Likes

Note: this should be x = x.checked_add(u64::from(byte) << r)?;, so that you correctly return None on too long integers, rather than panic.

Should be byte < 0x80.

3 Likes

@H2CO3 you are my hero :slight_smile:

Well agree but I will need some time to understand the idioms, and empirically of Rust and such examples helps me to understand it.

Because I wanted to stay in a working example.

Thank you again for your help

Good catch. @alex1 here's an updated Playground with these bugs fixed.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.