Moves, & Lifetimes - confused and stuck

I am working on a parser for a binary stream read from a spi flash memory chip. The due to the data size (20-40 megabyte data items), and some other crypto things [ signature and encryption] I cannot create vectors on the heap.

Thus, I need to do all of this work with REFERENCES to slices so that i can accomplish "zero copy" and that is where I am getting stuck with lifetimes {again}. Its not getting into or through my thick head... and why is it that I need TWO lifetimes, one for the IMPL and one for the struct?

What is in the playground is a greatly reduced snip of code from the larger applciation.

/// This is a buffer that is read/wrote to/from a spi flash device.
struct Buffer<'a> {
    cursor : usize,
    databuf : &'a mut [u8]
}

/// The data in the flash is a "TAG-LENGTH-VALUE" type (recursively)
/// Very simular to a RIFF file found in Microsoft Multi-Media files
struct MyChunk<'a> {
    /// in memory, there is a 32bit little endian tag identifies the data type.
    tag :u32,   
    /// followed by a data length in bytes
    length : u32, 
    /// followed by a variable length data buffer.
    payload : Buffer<'a> 
    // The goal is to have a generic "chunk" object parser.
}

impl<'a> Buffer<'a> {
    
    /// Given a SLICE - return a new buffer 
    pub fn new( buf:  &'a mut [u8] ) -> Self {
        Self {
            cursor : 0,
            databuf : buf,
        }
    }
    
    /// From the existing buffer, at the cursor - create a new buffer/slice of LENGTH length.
    pub fn from_slice( self, length : usize ) -> Self {
        // Dimensions of the slice we are creating.
        let lhs : usize = self.cursor;
        let rhs : usize = lhs + length;
        Self {
            cursor : 0,
            databuf : &mut self.databuf[ lhs .. rhs ]
        }
    }
    
    /// One simple example of u32 value parser there are many others in the larger library.
    pub fn rd_u32_le(mut self) -> u32 {
        let b0 : u32  = self.databuf[ self.cursor + 0 ] as u32;
        let mut b1 : u32  = self.databuf[ self.cursor + 1 ] as u32;
        let mut b2 : u32  = self.databuf[ self.cursor + 2 ] as u32;
        let mut b3 : u32  = self.databuf[ self.cursor + 3 ] as u32;
        self.cursor = self.cursor + 4;
        
        // this one does little endian others do big endian.
        b3 = b3 << 24;
        b2 = b2 << 16;
        b1 = b1 << 8;
        // return the bytes combined as a u32.
        let result : u32 = (b0 | b1 | b2 | b3) + 0;
        result
    }
}

/// This represents a CHUNK of data in the binary image.
/// It begins with a U32 tag - identifying the data following.
/// Next is a U32 length (in bytes) of the data
/// We then takea slice of the buffer and return that as the payload of the object.
impl<'a> MyChunk<'a> {

    /// Given a ByteBuffer, read a object tag and object length.
    /// Return an Objectwith the "value" as buffer within the orignal buffer.
    pub fn parse_from_bb2( bb2 : Buffer<'a> ) -> Self {
        let tag : u32 = bb2.rd_u32_le();
        let length : u32 = bb2.rd_u32_le();
        Self {
            tag : tag,
            length : length,
            payload : bb2.from_slice( length as usize )
        }
    }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0382]: use of moved value: `bb2`
  --> src/lib.rs:68:28
   |
66 |     pub fn parse_from_bb2( bb2 : Buffer<'a> ) -> Self {
   |                            --- move occurs because `bb2` has type `Buffer<'_>`, which does not implement the `Copy` trait
67 |         let tag : u32 = bb2.rd_u32_le();
   |                             ----------- `bb2` moved due to this method call
68 |         let length : u32 = bb2.rd_u32_le();
   |                            ^^^ value used here after move
   |
note: `Buffer::<'a>::rd_u32_le` takes ownership of the receiver `self`, which moves `bb2`
  --> src/lib.rs:41:26
   |
41 |     pub fn rd_u32_le(mut self) -> u32 {
   |                          ^^^^

error[E0382]: use of moved value: `bb2`
  --> src/lib.rs:72:23
   |
66 |     pub fn parse_from_bb2( bb2 : Buffer<'a> ) -> Self {
   |                            --- move occurs because `bb2` has type `Buffer<'_>`, which does not implement the `Copy` trait
67 |         let tag : u32 = bb2.rd_u32_le();
68 |         let length : u32 = bb2.rd_u32_le();
   |                                ----------- `bb2` moved due to this method call
...
72 |             payload : bb2.from_slice( length as usize )
   |                       ^^^ value used here after move

For more information about this error, try `rustc --explain E0382`.
error: could not compile `playground` (lib) due to 2 previous errors

The Buffer (self) is being consumed by rd_u32_le, so it can't be used more than once. This error goes away when using a &mut self reference here:

pub fn rd_u32_le(&mut self) -> u32 {

and also making bb2 mutable here:

pub fn parse_from_bb2( mut bb2 : Buffer<'a> ) -> Self {

I didn't look critically at the code, I just fixed this one problem.

For rd_u32_le (and no doubt several similar methods you have), I strongly recommend u32::from_le_bytes and friends (perhaps getting the input [u8; N] arrays via split_first_chunk). There's a lot of basic arithmetic that the u{N} and i{N} types already provide for you.

It might also turn out to be worth having a separate read-only buffer type, since &'a [u8] implements Copy. It seems less than ideal for from_slice to consume the input buffer. If you really need it to be read-write, and need to call from_slice while still using the source buffer, then you'd need to make something like the split_at_mut method on slices.

You might also wish to implement a "reborrow" method:

struct Buffer<'a> {
    cursor : usize,
    databuf : &'a mut [u8]
}

impl Buffer<'_> {
    pub fn reborrow(&mut self) -> Buffer<'_> {
        Buffer {
            cursor:  self.cursor,
            databuf: self.databuf,
        }
    }
}

// Or, explicitly annotated:
impl<'a> Buffer<'a> {
    pub fn reborrow_annotated<'b>(&'b mut self) -> Buffer<'b> {
        // Here, `self` is of type `&'b Buffer<'a>`.
        Buffer {
            cursor:  self.cursor,
            databuf: self.databuf,
        }
    }
}

&mut T allows for something called "reborrowing", where you can convert a &'short mut &'long mut T into a &'short mut T. This gives you &'short mut T without permanently consuming the source &'long mut T, though while the new reference is live, the original reference cannot be used. You can use reborrow-like methods to produce new Buffers without consuming the source buffer (even though the source buffer will be temporarily unable to be used). Above, the reborrowed buffer would have an independent cursor... might not be what you want.

There's also some parts of the code that would benefit from cargo clippy or cargo fmt; for instance, the above initialization would normally be written

Self {
    tag,
    length,
    payload: bb2.from_slice(length as usize),
}

I strongly disagree with some of the clippy nonsense in a very intense way.

Sorry but this is a RANT..

A) Software engineering is not a game of "who has the best propeller Bennie hat" instead, it is best to create code that 75% Of other developers unfamiliar with your work can easily understand. This fits the 80/20 rule - use 20% of your brain to do 80% of the work, and 80% of your brain to do the harder 20% part.

It sure seems that the rust language people think otherwise - There is no way but the rust way, and we will force everyone to do this the harder way because we can.

And please don't suggest I should disable all of these warnings in my code. Instead these warnings should be TURNED OFF in the compiler by default, and only required by the people who want nothing but propeller head code to review.

here is my reasoning and lessons I have learned after writing software since the late 1970s

B) if you look at knowledge one can divide the bell curve into 4 areas (quartiles) You may be a senior developer in this subject area (the upper 25%) but most everyone else is in the lower 75%.

They may be in the upper 2% of the language syntax knowledge but they are not in the upper 2% of the problem space of what your code is doing.

C) It should be trivial for you or anyone to examine/review any code your team produces with less then 25% of your brain power and cover 75% of their work. This leaves the upper 25% for the hard and complex stuff. This fits the well known 80/20 rule.

One rule to help with that is to require code have ZERO warnings as a first step.

Disabling warnings is very often considered a red flag at review time.

D) To that end - using tagged structure initializers makes it abundantly clear how each value in a structure initialization is used and keeps your code at the 25% complexity level. Removing the tags makes everyone spend time going back and forth between the structure definition, and the initialization spending (wasting) time double checking things. WHY? Because some "beanie cap" moron who controls the rust clippy rules in the compiler said so. This does not help the junior developer on the team increase his knowledge.

In this area - the C language for nearly 20 years did not support tagged initialization, now it does. One had to count commas in the initializer what a pain such an easy way to make a mistake in the order of values. Yea rust tells you about a missing field, but the order of value is just as important. But rust for some reason does not want this, why? It baffles me to no end why this practice is considered bad rust practice.

Then in C99 they added this feature. In C++, you can name each field as a mini function like statement in the constructor. Why? Because it made the initializer extremely and very clear. Instead: RUST has decided to take a giant step backwards.

IMHO - demanding/suggesting that the structure tags be removed is a beanie cap war that does not help any one learn (A) the technical domain of the problem space, and (B) makes a senior person transitioning to RUST have to fight the language because some "beanie cap" made this decision that does not hurt the implementation one bit. The compiler tosses the extra information.

BUT yea, that is what we are dealing with with the RUST language.

E) In software engineering there has been an axiom for years, WHEN IN DOUBT PARENTHESIZE - but no - not with rust - you must not do that. Rust (clippy, or the compiler clippy features) calls this - removing so-called redundant or un-necessary parens, so many times a junior engineer has to go get his or her little order of precedence cheat sheet card and try again. And a senior person has to pull out the little card from their wallet or where ever it is kept and double check each complex statement. WHY? should this person spend 80% Of their time double checking something that could simply be made very clear with ()s..

BUT no - one could have a very simple rule, always parenthesize always, there is no argument, no discussion, no having to stop and ponder the expression. ALL of the reviewers have to stop and think when they come to that expression

The extra ()'s act like extra documentation that makes the intent abundantly and extra-ordinarily clear.

YEA, I get around this by creating multiple (breaking down) "let tmpX/tmpY/tmpZ = expression" - statements so that i can make things abundantly clear and unambiguous - because I can't use ()s without running up against the RUST language clippy stupidity.

BUT yea, again - that is what we are dealing with with the RUST language, we sigh and move on.

F) And the same applies to a return statement at the end of a function.

That return statement effectively serves as documentation/commnetary.

Another case of:

BUT yea, again - that is what we are dealing with with the RUST language.

In summary and to be clear: I am not at all suggesting that these features should be removed from clippy or the compiler, I am just saying that the default setting for these things is very wrong headed, instead these tools should error on the side of "better documentation within the code", documentation comes in many forms. Some forms are not so obvious.

</RANT end>

Just checking - you are aware that you can only simplify StructName { [...] field_name: field_name [...] } to StructName { [...] field_name [...] }, right? This doesn't amount to counting commas or being order-dependent. It's based solely on the name of the field matching the name of the variable you're setting the field to.

This is actually an argument that I see people make in favor of rustfmt, since using cargo fmt / rustfmt removes many irrelevant choices that don't affect the behavior of the code. (Personally, I dislike cargo fmt; I prefer to manually format my code.)

Really - the variable needs to match the field name? Where is that specified in the language...

If that is true - What is frustrating is strange and weird things like this that suddenly appear in random comments and it not shown nor is it discussed in any of the documentation I have read.

Why on earth would the field name, and the variable name need to match? OR is it that you can elide the field name if and only if the variable name exactly matches the field name.

you may have misinterpreted the referred text - they pointed out this: Defining and Instantiating Structs - The Rust Programming Language

edit: rephrasing, "in case the field name and variable holding the value to assign to it are the same, one can omit the variable name and keep the prop name (c++ tag?) only..."

Indeed, this is what I meant.

Sure, that's ok. But just one question: Are you forced to use Rust?

clippy is indeed being recommended very frequently and strong-heartedly so that it may feel like a "must use" tool. Still though, it is not compiler/mandatory, but an optional linter...

I was lucky enough that it matched my feelings or i have found its recommendations where it "corrected me" as intuitive 99% of the times.

That's exactly what it is, though to be complete, some people might consider not the field name but the variable to be elided in this case. (The whole point of only allowing it in case of matching names is of course that neither information is being lost.)

As to where you could have read about it, one place would be the book "The Rust Programming Language" here: https://doc.rust-lang.org/book/ch05-01-defining-structs.html#using-the-field-init-shorthand

The official documentation is the Rust Reference: Struct expressions: Struct field init shorthand.

Putting parentheses around everything is not a very common preference (other than among Lisp enthusiasts). Most people prefer x * x + 2 * x + 1 == 0 to (((x * x) + (2 * x)) + 1) == 0.

Once you get used to the precedence, it works the other way: having redundant parentheses makes you stop and ponder "why the extra parentheses, is something unusual going on here?".

Using one standard notation reduces cognitive overhead for readers.

In Rust, the return keyword indicates early, abnormal exit. Putting it in the last statement would confuse rather than document. It's like adding a redundant continue at the end of a loop.

There are real requirements for this.

And:

https://medium.com/codeelevation/the-end-of-c-c-u-s-government-issues-strictest-mandate-key-software-must-begin-phasing-out-c-62f8816da06b

And:

https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/article/3608324/us-and-international-partners-issue-recommendations-to-secure-software-products/

And that fails the 80/20 rule.

I consider extra ()s as additional documentation. Like a comment. RUST does not. Thats a problem in my book. By using()s - I make the code very clear and un ambigous to the next reader.

Nobody questions - Ah.. should that have () here or not?

Why should anyone need to spend 80% of their time double checking order of precedence, when it can be clearly documented in a very clear way.

If you want to limit this - go ahead - put a reasonable limit - I think C has a language requirement that a compiler supports upto 127 levels. But requiring somebody to reduce this to zero is just wrong headed.

Not saying that, but I do believe this is reasonable:

   (x * x) + (2 * x) + 1 == 0 

but yea, if somebody else is more comfortable with another layer or two - let them be successful.

Why beat them over the head, about removing the extra()s? What did that gain them? They have to spend brain time on reducing ()s to meet this artificial rust language requirement, when they can spend brain time on other harder and more interesting things.

Requiring this with the RUST way makes RUST look like one of those stupid PEMDAS videos on youtube about how people cannot get the correct result, is that where the rust people really want to go with this language?

That emits no warning with default rustc and clippy lint levels.

The default rustc lint isn't about operator precedence. Whatever clippy lints there may be, I'm less familiar with.

I’m unsure what we’re discussing here… who is “beating” anyone “over the head”?

I’m trying your exact code right now (x * x) + (2 * x) + 1 == 0 and it’s remaining untouched by rustfmt, and not warned on by rustc and not even by clippy.

But even if clippy did warn here, a linter which you can opt-in to choose to use it, that wouldn’t be aptly described with an analogy of “beating” people IMHO.

You might be seeing this lint if you’re used to C-style if statements: Warn-by-default Lints - The rustc book

That lint has nothing to do with precedence or parens within an expression.

Searching for parentheses-related clippy lints, I even see some lints suggesting that parentheses be added to clarify precedence, such as clippy::precedence (warn by default) and clippy::precedence_bits (allow by default). There is also a lint for unnecessary double parens (such as ((x)) instead of (x))… hopefully you’re not hitting that.