Transfer C code to rust code

alex1 · December 7, 2022, 1:35pm

Hi.

As I'm new to rust I have some basic questions and hope that you have some patience with me.

I try to create a SPOA ( Stream Processing Offload Agent to handle HAProxy offloading possibilities like running some WASM workload.

The SPOP ( Stream Processing Offload Protocol ) is based on the peers protocol which uses Encoded Integer and Bitfield.

The Variable-length integer (varint) are handled via shift operators in C which is something is not similar available in rust, as far as I have read in different answers .

Don't get my wrong, I don't expect that you do any work for me but help me to understand how to transfer a C code paradigm to rust code paradigm.

Now let's make some concrete examples.

github.com

haproxy/haproxy/blob/master/include/haproxy/intops.h#L413


      
          	case 0x00000000000408f0ULL ... 0x00000000020408efULL: return 4;
          	case 0x00000000020408f0ULL ... 0x00000001020408efULL: return 5;
          	case 0x00000001020408f0ULL ... 0x00000081020408efULL: return 6;
          	case 0x00000081020408f0ULL ... 0x00004081020408efULL: return 7;
          	case 0x00004081020408f0ULL ... 0x00204081020408efULL: return 8;
          	case 0x00204081020408f0ULL ... 0x10204081020408efULL: return 9;
          	default: return 10;
          	}
          }
          
          
/* Encode the integer <i> into a varint (variable-length integer). The encoded
           * value is copied in <*buf>. Here is the encoding format:
           *
           *        0 <= X < 240        : 1 byte  (7.875 bits)  [ XXXX XXXX ]
           *      240 <= X < 2288       : 2 bytes (11 bits)     [ 1111 XXXX ] [ 0XXX XXXX ]
           *     2288 <= X < 264432     : 3 bytes (18 bits)     [ 1111 XXXX ] [ 1XXX XXXX ]   [ 0XXX XXXX ]
           *   264432 <= X < 33818864   : 4 bytes (25 bits)     [ 1111 XXXX ] [ 1XXX XXXX ]*2 [ 0XXX XXXX ]
           * 33818864 <= X < 4328786160 : 5 bytes (32 bits)     [ 1111 XXXX ] [ 1XXX XXXX ]*3 [ 0XXX XXXX ]
           * ...
           *
           * On success, it returns the number of written bytes and <*buf> is moved after

// from https://doc.rust-lang.org/book/ch03-02-data-types.html
// uint64_t => u64 in rust
// char ** => &String in rust
// char * => u8 in rust
// possible rust function declartion 
// fn encode_varint (i: u64, buf: &String, end: u8) -> i8 { .. }

static inline int encode_varint(uint64_t i, char **buf, char *end)
{
	unsigned char *p = (unsigned char *)*buf;
	int r;

	if (p >= (unsigned char *)end)
		return -1;

	if (i < 240) {
		*p++ = i;
		*buf = (char *)p;
		return 1;
	}

	*p++ = (unsigned char)i | 240;
	i = (i - 240) >> 4; // <= How can this be done in rust?

	while (i >= 128) {
		if (p >= (unsigned char *)end)
			return -1;
		*p++ = (unsigned char)i | 128; // <= How can this be done in rust?
		i = (i - 128) >> 7;                      // <= How can this be done in rust?
	}

	if (p >= (unsigned char *)end)
		return -1;
	*p++ = (unsigned char)i;

	r    = ((char *)p - *buf);
	*buf = (char *)p;
	return r;
}

Thank you for any help

H2CO3 · December 7, 2022, 1:43pm

Shift operators exist in Rust and work almost identically (not counting precedence and the fact that in Rust, signed overflow is not UB).

// char ** => &String in rust
// char * => u8 in rust

I don't know where you are getting these from, but it's not even approximately accurate. The C function is mutating the passed-in pointer (hence, a pointer passed by pointer), so it would at least need to be a &mut String, but that's actually not necessary in Rust, as you can just return anything by value, including a String (unless you want keep a single buffer during the whole encoding procedure). But what you get from this encoding is definitely not going to be valid UTF-8, so you should be using a Vec<u8> rather than a String anyway.

And a char * is definitely not a u8, it's more like a *const u8 or *mut u8 (or i8, depending on the signedness of C's char).

All in all, here's a 16-line idiomatic re-implementation (with more suggestive constants):

fn encode_varint(mut i: u64, buf: &mut Vec<u8>) {
    if i < 0xf0 {
        buf.push(i as u8);
        return;
    }
    
    buf.push(i as u8 | 0xf0);
    i = (i - 0xf0) >> 4;
    
    while i >= 0x80 {
        buf.push(i as u8 | 0x80);
        i = (i - 0x80) >> 7;
    }
    
    buf.push(i as u8);
}

This never fails, so the negative return value has no equivalent, and the non-negative return values' equivalent is the difference between the length of the buffer before and after encoding.

steffahn · December 7, 2022, 1:49pm

See here for a list of arithmetic and bitwise operators in Rust.

jbe · December 7, 2022, 1:51pm

Maybe should add that adding/substracting unsigned numbers is an error on overflow in Rust (which may wrap or cause a panic), while in C it wraps by default, right?

alex1 · December 7, 2022, 2:45pm

Wow thank you for your time and solution, this is a really good start point for me.
I will now dig deeper into rust with such a great help.

alex1 · December 8, 2022, 1:31pm

I have now tried to implement the decode_varint and it looks quite good for a rookie thanks to the help of @H2CO3

I struggle now with some small issue that I don't get the full decoded value back.

my-implemantion

fn decode_varint(i: &mut u64, buf: &mut Vec<u8>) {
    if i < &mut 0xf0 {
        *i = buf.pop().unwrap() as u64;
        return;
    }
    
    let mut r = 4;
    loop {
        *i = buf.pop().unwrap() as u64;
        *i += *i << r;
        r +=7;
        //dbg!(r,i);
        if i <= &mut 0x80 {
            *i = buf.pop().unwrap() as u64;
            break;
        }
    }
}

That's the output where you can see that the ret value is not what it should be.

[src/main.rs:40] &buf = [
    239,
]
// first run with small value works
[src/main.rs:44] "239" = "239"
[src/main.rs:44] ret = 239

[src/main.rs:48] &buf = [
    240,
    0,
]
// now I get 0 instead of 240
[src/main.rs:52] "240" = "240"
[src/main.rs:52] ret = 0

[src/main.rs:59] "1337" = "1337"
[src/main.rs:59] ret = 68
[src/main.rs:64] &buf = [
    240,
    149,
    59,
]
[src/main.rs:67] "123456" = "123456"
[src/main.rs:67] ret = 59

That's the c code.

github.com

haproxy/haproxy/blob/master/include/haproxy/intops.h#L457


      
          
          
	if (p >= (unsigned char *)end)
          		return -1;
          	*p++ = (unsigned char)i;
          
          
	r    = ((char *)p - *buf);
          	*buf = (char *)p;
          	return r;
          }
          
          
/* Decode a varint from <*buf> and save the decoded value in <*i>. See
           * 'spoe_encode_varint' for details about varint.
           * On success, it returns the number of read bytes and <*buf> is moved after the
           * varint. Otherwise, it returns -1. */
          static inline int decode_varint(char **buf, char *end, uint64_t *i)
          {
          	unsigned char *p = (unsigned char *)*buf;
          	int r;
          
          
	if (p >= (unsigned char *)end)
          		return -1;

static inline int decode_varint(char **buf, char *end, uint64_t *i)
{
	unsigned char *p = (unsigned char *)*buf;
	int r;

	if (p >= (unsigned char *)end)
		return -1;

	*i = *p++;
	if (*i < 240) {
		*buf = (char *)p;
		return 1;
	}

	r = 4;
	do {
		if (p >= (unsigned char *)end)
			return -1;
		*i += (uint64_t)*p << r;
		r  += 7;
	} while (*p++ >= 128);

	r    = ((char *)p - *buf);
	*buf = (char *)p;
	return r;
}

H2CO3 · December 8, 2022, 2:08pm

pop() removes from the end of the vector, not from the beginning. Thus the order of bytes your decoder sees is wrong. Furthermore, you are comparing the already-decoded partial result to 128, whereas you should be comparing the next byte. Your partial result is never going to be smaller than 128, because it started at a number ≥ 240, and only got shifted to the left, so that's clearly nonsensical.

You also seem to be doing a lot of stuff that wasn't in the original code; why?

By the way, in case I wasn't clear in my previous post: don't write Rust by imitating C. Rust is a different, more sophisticated language with its own idioms, and empirically, most C code is ugly and not very well-written anyway, the above piece being no exception.

Accordingly, you don't need any of that funky pointer dance. Why don't you just return the parsed number by value? (I don't get why the C code doesn't do that, either.)

Here's a simpler and correct re-implementation that also checks for the end of the buffer instead of just assuming that it's large enough:

fn decode_varint(buf: &[u8]) -> Option<(u64, &[u8])> {
    let (&head, mut rest) = buf.split_first()?;
    let mut x = u64::from(head);
    
    if x < 0xf0 {
        return Some((x, rest));
    }

    let mut r = 4;
    loop {
        let (&byte, tail) = rest.split_first()?;
        rest = tail;
        
        x += u64::from(byte) << r;
        r += 7;
        
        if byte <= 0x80 {
            break;
        }
    }
    
    Some((x, rest))
}

Playground

afetisov · December 8, 2022, 2:24pm

Note: this should be x = x.checked_add(u64::from(byte) << r)?;, so that you correctly return None on too long integers, rather than panic.

Should be byte < 0x80.

alex1 · December 8, 2022, 2:25pm

@H2CO3 you are my hero

Well agree but I will need some time to understand the idioms, and empirically of Rust and such examples helps me to understand it.

Because I wanted to stay in a working example.

Thank you again for your help

H2CO3 · December 8, 2022, 2:39pm

Good catch. @alex1 here's an updated Playground with these bugs fixed.

system · March 8, 2023, 2:40pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
C-like packed bitfield-struct in Rust help	4	492	February 28, 2023
Golang binary.Uvarint in rust	5	1051	October 1, 2020
Translating a mildly complex C project to Rust code review	8	817	May 12, 2022
What are the ntohl / NetworkToHostOrder equivalents? help	3	2949	January 12, 2023
Implementing an API in Rust help	15	751	October 27, 2022

Transfer C code to rust code

Related Topics