Lifetime - pushing a ref into a C library function


#1

A new day, a new problem :slight_smile:

So the thing is, I have the following situation:

let mut format = LeekFormat::new();

for line in BufReader::new(fh).lines() {
     let uline = line.unwrap();
     match re_begin.captures(&uline) {                                                 // re_begin is a regex 
          Some(_caps) => { 
                    let mut rec = Recorder::new(&(_caps[1].as_bytes()));
                    for mt in re_iter.captures_iter(&uline) {                             // re_iter is a regex 
                         rec.push_tag(&(mt[1].as_bytes()));
                    }
                    format.push_record(&rec);
           }
           None => {
                   break;
           }
     }
}

So what happens is that a Recorder requires a &[u8] as an input. It is a C library function with rust API and I do not want to go into changing or modifying anything there. What I need to do is to push the matched pattern as &[u8] into rec. Obviously I cannot do that.

error[E0597]: `mt` does not live long enough
   --> src/util/leek.rs:365:23
    |
365 |                                 rec.push_tag(&(mt[1].as_bytes()));
    |                                                ^^ borrowed value does not live long enough
366 |                         }
    |                         - `mt` dropped here while still borrowed
...
369 |                 }
    |                 - borrowed value needs to live until here

So my question is: how to bypass this situation ?

Best

M


#2

Probably rec.push_tag(mt[1].as_bytes().clone()); (haven’t/can’t verified it)


#3

I tried cloning it but since i am pushing a reference i still get ...not live long enough problem. I was thinking of pushing the matches into vec and then pe-pushing it into recorder … but this looks very silly way to go…


#4

The problem doesn’t seem to be related to C at all (except if you did the same in C, you’d get use-after-free crash).

format has to live longer than the outer for loop, since the compiler won’t stop you from using it after the loop ends. BUT you’re putting a temporary borrow of rec in it. The owned rec is destroyed on every loop iteration, so all borrows of it are invalidated.

In Rust a reference (a temporary borrow) cannot exist without having an owned counterpart live somewhere permanently for the entire duration of the borrow. When you put &[u8] somewhere, you’re not storing the data. You’re only reserving a temporary read-only access to data that has to be stored elsewhere in the first place.

Also note that in Rust references are not pointers. Conceptually, they don’t map to C’s concept of passing things by reference (e.g. Box is also equivalent to C’s passing by reference, but it’s not a Rust reference).

But you’re not storing the data you’re borrowing from. You’re throwing it all away — destroying line on every iteration, making everything borrowed form that line invalid.

So you need to find a permanent storage for all your data that is in any way, even indirectly, needed for the format.

The easy solution would be to first collect all lines, or clones of captures, in a Vec that is created before format. Then borrow content of that Vec to get whatever shape of borrows you need.


#5

I see… maybe if I redefine the problem a bit. What I have is an object called format that needs to be filled. I cannot mess with it . The object is initialized by utilizing the following call:

let mut format = LeekFormat::new();

let rec = Recorder::new(&("xx".as_bytes()))
.push_tag(b"X1", &("my entry 1".to_string()))
.push_tag(b"L2", &("my entry 2".to_string()))
);

format.push_record(&rec);

If a call it like that, it works. But now I have a dynamic number of push_tags calls (sometimes 3 sometimes 12 sometimes 15, etc.) therefore i decided to go with the for loop as in my example above. I loop through a file, iteratively match the info i need and try to push it. But as you can see the first variable goes out of scope and I have no clue on how to bypass this.

PS.
The reason I mentioned C is because this is a c function call in the background.


#6

If push_record copies the data, it should be fine. But if it keeps the reference (the same way Vec.push(&rec) would), then you need to make sure all references live, and all live longer than format.

In Rust values that live in variables are valid only in variable’s scope. If you crate let rec inside a loop iteation, it’s valid only for one loop iteration, and never outside the loop.

You’ll need to collect them in Vec<Recorder> in order to have them all live at the same time, and all live longer than one loop iteration.

I expect you’ll have similar problem with line. In for line in … there exists only one line at a time. All previous lines are destroyed before the next line is read. All references you have to things found in that line are gone on every loop iteration. So you’ll need to collect lines in Vec<String> to have references to them that are usable outside of their loop.

In terms of lifetimes there’s a huge difference between:

for line in BufReader::new(fh).lines() {

and

let lines: Vec<_> = BufReader::new(fh).lines().filter_map(|l| l.ok()).collect();
for line in &lines {}

In the first case the line is temporary and destroyed on every loop iteration. In the second case all lines exist at the same time, and all are valid during and after the loop.


#7

Yes, I understand what you are saying. Unfortunately that means I reached a limitation that cannot be overcomed. I need to think about this … I see now that security has its trade-offs. Thnx …


#8

Note that equivalent code in C would crash or give garbage result. You have:

while(line = next_line()) {
   char *mt = strchr(line, ' '); // let's pretend it's captures - returns pointer to line
   struct Record rec = {}; // on stack
   push_tag(&rec, mt); // rec saves pointer that belongs to line
   push_record(&format, &rec); // format saves stack pointer
   free(line); // blows up all pushed tags
}
// format is full of invalidated stack pointers

It’s not a security trade-off. You’ve inadvertently written code that requires a logical impossibility and can’t work regardless of language.

And it’s easy to overcome - you have to save data used by format instead of throwing it away.


#9

You are right ! I am not thinking clearly any more. Thank you so much !!

And sorry for bugging :slight_smile:


#10

No worries. It’s easy to forget that Rust doesn’t have garbage collector, and expect it to behave like a GC language, rather than C.


#11

you are gonna freek out with me… I am just to dumb to figure this one on my own. So here is a small complete example:

use regex::Regex;


#[derive(Debug, Clone)]
pub struct Record<'a>{
    data: Vec<(&'a [u8], Vec<u8>)>,
}


impl<'a> Record<'a> {

    pub fn new() -> Self {
        Record {
            data: Vec::new(),
        }
    }

    pub fn push_data<V: ToString>(&mut self, info: &'a [u8], value: &V) -> &mut Self {
        self.data.push((info, value.to_string().into_bytes()));
        self
    }

}


/// I cannot modify anything above this point


fn main() {
    let mut obj = Record::new();

    let uline = "A:my_new_a B:my_new_a C:my_new_c ";

    let regex_one = Regex::new(r"^A").unwrap();
    let regex_two = Regex::new(r"(\w{1}):(\w+)\s+").unwrap();
    
    match regex_one.captures(&uline) {
               Some(_caps) => { 
			for m in regex_two.captures_iter(&uline) {
				obj.push_data(m[1].as_bytes(), &m[2].to_string());
				println!("{:#?}", m);
			}
		}
		None => {
                    println!("Hey there!");
                }
    }
    
    println!("{:#?}", obj);
}

So m from the matched regex is borrowed and does not last long enough. When I convert it to string it is being “hard coded” and therefore lives. How do I do the same for m[1] and pass it to obj as in the above situation so it lives ?

I know I am starting to become annoying but I really need to know and understand this situation otherwise I’ll go insane ! Plus this is my Frankenstein and I need it to be alive so I can do the evil genius laughter i prepared :slight_smile:


#12

Replace

obj.push_data(m[1].as_bytes(), &m[2].to_string());

with

obj.push_data(m.get(1).unwrap().as_str().as_bytes(), &m[2].to_string());

The Index<usize> impl gives you a short borrow, not associated with the lifetime of the text itself - get() gives you one associated with the text.


#13

That’s a nice catch, but the problem is WTF. Is that borrow checker limitation, or some kind of lack of higher-ranked types that makes &*m.index(i) lose the original lifetime?


#14

From here:

impl<'t> Index<usize> for Captures<'t> {
    type Output = str;

    fn index(&self, i: usize) -> &str {
        self.get(i).map(|m| m.as_str())
            .unwrap_or_else(|| panic!("no group at index '{}'", i))
    }
}

As far as the signature is concerned, it says it returns a borrow tied to self. If it were type Output = &'t str, then it would work as expected I think. Here’s a simple playground showing the difference - you can uncomment the other associated type to get the code to compile.

@BurntSushi (presumably) sort of acks this in the comments, but I’m not sure why it’s not type Output = &'t str (yes, index() would return a &&'t str but doesn’t seem terrible - maybe I’m missing something).