As I'm new to rust I have some basic questions and hope that you have some patience with me.
I try to create a SPOA ( Stream Processing Offload Agent to handle HAProxy offloading possibilities like running some WASM workload.
The SPOP ( Stream Processing Offload Protocol ) is based on the peers protocol which uses Encoded Integer and Bitfield.
The Variable-length integer (varint) are handled via shift operators in C which is something is not similar available in rust, as far as I have read in different answers .
Don't get my wrong, I don't expect that you do any work for me but help me to understand how to transfer a C code paradigm to rust code paradigm.
Now let's make some concrete examples.
// from https://doc.rust-lang.org/book/ch03-02-data-types.html
// uint64_t => u64 in rust
// char ** => &String in rust
// char * => u8 in rust
// possible rust function declartion
// fn encode_varint (i: u64, buf: &String, end: u8) -> i8 { .. }
static inline int encode_varint(uint64_t i, char **buf, char *end)
{
unsigned char *p = (unsigned char *)*buf;
int r;
if (p >= (unsigned char *)end)
return -1;
if (i < 240) {
*p++ = i;
*buf = (char *)p;
return 1;
}
*p++ = (unsigned char)i | 240;
i = (i - 240) >> 4; // <= How can this be done in rust?
while (i >= 128) {
if (p >= (unsigned char *)end)
return -1;
*p++ = (unsigned char)i | 128; // <= How can this be done in rust?
i = (i - 128) >> 7; // <= How can this be done in rust?
}
if (p >= (unsigned char *)end)
return -1;
*p++ = (unsigned char)i;
r = ((char *)p - *buf);
*buf = (char *)p;
return r;
}
Shift operators exist in Rust and work almost identically (not counting precedence and the fact that in Rust, signed overflow is not UB).
// char ** => &String in rust
// char * => u8 in rust
I don't know where you are getting these from, but it's not even approximately accurate. The C function is mutating the passed-in pointer (hence, a pointer passed by pointer), so it would at least need to be a &mut String, but that's actually not necessary in Rust, as you can just return anything by value, including a String (unless you want keep a single buffer during the whole encoding procedure). But what you get from this encoding is definitely not going to be valid UTF-8, so you should be using a Vec<u8> rather than a String anyway.
And a char * is definitely not a u8, it's more like a *const u8 or *mut u8 (or i8, depending on the signedness of C's char).
All in all, here's a 16-line idiomatic re-implementation (with more suggestive constants):
fn encode_varint(mut i: u64, buf: &mut Vec<u8>) {
if i < 0xf0 {
buf.push(i as u8);
return;
}
buf.push(i as u8 | 0xf0);
i = (i - 0xf0) >> 4;
while i >= 0x80 {
buf.push(i as u8 | 0x80);
i = (i - 0x80) >> 7;
}
buf.push(i as u8);
}
This never fails, so the negative return value has no equivalent, and the non-negative return values' equivalent is the difference between the length of the buffer before and after encoding.
Maybe should add that adding/substracting unsigned numbers is an error on overflow in Rust (which may wrap or cause a panic), while in C it wraps by default, right?
fn decode_varint(i: &mut u64, buf: &mut Vec<u8>) {
if i < &mut 0xf0 {
*i = buf.pop().unwrap() as u64;
return;
}
let mut r = 4;
loop {
*i = buf.pop().unwrap() as u64;
*i += *i << r;
r +=7;
//dbg!(r,i);
if i <= &mut 0x80 {
*i = buf.pop().unwrap() as u64;
break;
}
}
}
That's the output where you can see that the ret value is not what it should be.
[src/main.rs:40] &buf = [
239,
]
// first run with small value works
[src/main.rs:44] "239" = "239"
[src/main.rs:44] ret = 239
[src/main.rs:48] &buf = [
240,
0,
]
// now I get 0 instead of 240
[src/main.rs:52] "240" = "240"
[src/main.rs:52] ret = 0
[src/main.rs:59] "1337" = "1337"
[src/main.rs:59] ret = 68
[src/main.rs:64] &buf = [
240,
149,
59,
]
[src/main.rs:67] "123456" = "123456"
[src/main.rs:67] ret = 59
That's the c code.
static inline int decode_varint(char **buf, char *end, uint64_t *i)
{
unsigned char *p = (unsigned char *)*buf;
int r;
if (p >= (unsigned char *)end)
return -1;
*i = *p++;
if (*i < 240) {
*buf = (char *)p;
return 1;
}
r = 4;
do {
if (p >= (unsigned char *)end)
return -1;
*i += (uint64_t)*p << r;
r += 7;
} while (*p++ >= 128);
r = ((char *)p - *buf);
*buf = (char *)p;
return r;
}
pop() removes from the end of the vector, not from the beginning. Thus the order of bytes your decoder sees is wrong. Furthermore, you are comparing the already-decoded partial result to 128, whereas you should be comparing the next byte. Your partial result is never going to be smaller than 128, because it started at a number ≥ 240, and only got shifted to the left, so that's clearly nonsensical.
You also seem to be doing a lot of stuff that wasn't in the original code; why?
By the way, in case I wasn't clear in my previous post: don't write Rust by imitating C. Rust is a different, more sophisticated language with its own idioms, and empirically, most C code is ugly and not very well-written anyway, the above piece being no exception.
Accordingly, you don't need any of that funky pointer dance. Why don't you just return the parsed number by value? (I don't get why the C code doesn't do that, either.)
Here's a simpler and correct re-implementation that also checks for the end of the buffer instead of just assuming that it's large enough:
fn decode_varint(buf: &[u8]) -> Option<(u64, &[u8])> {
let (&head, mut rest) = buf.split_first()?;
let mut x = u64::from(head);
if x < 0xf0 {
return Some((x, rest));
}
let mut r = 4;
loop {
let (&byte, tail) = rest.split_first()?;
rest = tail;
x += u64::from(byte) << r;
r += 7;
if byte <= 0x80 {
break;
}
}
Some((x, rest))
}