How do I prevent memory leak?

I'm writing something like lending iterator:

use std::marker::PhantomData;
use std::mem::forget;
use std::ptr::NonNull;
use itertools::{Itertools, TupleWindows};

struct RefToNonNullStruct<I> {
    iter: I,
}

impl<'a, I: Iterator<Item = &'a T>, T: ?Sized + 'a> Iterator for RefToNonNullStruct<I> {
    type Item = NonNull<T>;

    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(NonNull::from)
    }
}

pub struct NewLineSeparatedRevisionIterator<'a> {
    underlying_iterator: TupleWindows<RefToNonNullStruct<std::str::Split<'a, char>>, (NonNull<str>, NonNull<str>)>,
    starting_pointer_and_must_not_be_released_until_this_iterator_is_dropped: NonNull<str>,
    __phantom: PhantomData<&'a ()>
}

impl<'a> NewLineSeparatedRevisionIterator<'a> {
    pub fn new(a: String) -> Self {
        let b = a.as_str() as *const str;
        Self {
            starting_pointer_and_must_not_be_released_until_this_iterator_is_dropped: {
                let v = NonNull::from(a.as_str());
                forget(a);
                v
            },
            underlying_iterator: RefToNonNullStruct { iter: unsafe { &*b }.split('\n') }.tuple_windows(),
            __phantom: PhantomData
        }
    }
}

impl<'a> Iterator for NewLineSeparatedRevisionIterator<'a> {
    type Item = (&'a str, &'a str);

    fn next(&mut self) -> Option<Self::Item> {
        self.underlying_iterator.next().map(|(a, b)| {
            unsafe {
                (a.as_ref(), b.as_ref())
            }
        })
    }
}

mod tests {
    use crate::iter::NewLineSeparatedRevisionIterator;

    #[test]
    fn test1() {
        let v = NewLineSeparatedRevisionIterator::new("12345\n67890".to_string()).collect::<Vec<_>>();
        assert_eq!(v, [("12345", "67890")]);
    }
}

I must use String because I want to capture allocated temporary on iterator:

pub fn revisions_older_to_newer_2<'a>(on: &'a Path, from: &'a str, to: &'a str) -> crate::result::Result<NewLineSeparatedRevisionIterator<'a>> {
    let out = inspect_command(invoke_git(on).args([
        "log",
        &format!("{from}...{to}"),
        r"--pretty=format:%H",
        "--reverse"
    ]))
        .output()?
        .stdout;

    Ok(NewLineSeparatedRevisionIterator::new(String::from_utf8(out).unwrap()))
}

However, testing above code with miri reports that "12345\n67890".to_string() was leaked. Can I mitigate this?

Of course it was – you forget the String that owns it, and then you never free it. Your iterator type doesn't have a Drop impl that would recreate an owned String and drop it.


You are over-complicating this way too much. You don't need any raw pointers or unsafe for this. All you need is: Playground.

    fn next(&mut self) -> Option<(&str, &str)> {
        let Some(next) = self.string[self.start..].find('\n') else {
            return None;
        };
        let first = &self.string[self.start..][..next];
        self.start += next + 1;
        
        let end = self.string[self.start..].find('\n').unwrap_or(self.string[self.start..].len());
        let second = &self.string[self.start..][..end];

        Some((first, second))
    }
4 Likes

Thank you for simplifying! I would like to use this implementation as impl Iterator, however lifetime issue comes up:

impl Iterator for LineTuples {
    type Item = (&str, &str);

    fn next(&mut self) -> Option<(&str, &str)> {
        let Some(next) = self.string[self.start..].find('\n') else {
            return None;
        };
        let first = &self.string[self.start..][..next];
        self.start += next + 1;

        let end = self.string[self.start..].find('\n').unwrap_or(self.string[self.start..].len());
        let second = &self.string[self.start..][..end];

        Some((first, second))
    }
}

Do I need to have phantom lifetime on it, or is there some better solution?

To help you, we need to see the full error you're getting from the compiler, preferably from running cargo check in the terminal.

I have changed above code to following. Miri does not report UB nor leaks... But can we avoid that unsafe?

pub struct LineTuples<'item> {
    string: String,
    start: usize,
    __phantom: PhantomData<&'item str>,
}

impl LineTuples<'_> {
    pub fn new(string: String) -> Self {
        LineTuples { string, start: 0, __phantom: PhantomData }
    }
}

impl<'ref_lt: 'item, 'item> Iterator for &'ref_lt mut LineTuples<'item> {
    type Item = (&'item str, &'item str);

    fn next(&mut self) -> Option<(&'item str, &'item str)> {
        let next = self.string[self.start..].find('\n')?;
        let first = &self.string[self.start..][..next];
        self.start += next + 1;

        let end = self.string[self.start..].find('\n')
            .unwrap_or(self.string[self.start..].len());
        let second = &self.string[self.start..][..end];

        Some((
            unsafe { &*(first as *const str) },
            unsafe { &*(second as *const str) }
        ))
    }
}

Also the change blocks inline construction, I had to change test code:

    #[test]
    fn test1() {
        let mut v = LineTuples::new("12345\n67890".to_string());
        let v = v.collect::<Vec<_>>();
        assert_eq!(v, [("12345", "67890")]);
    }

I'd like to bring it back.

The following is the error:

error: `impl` item signature doesn't match `trait` item signature
  --> src/iter.rs:17:5
   |
17 |     fn next(&mut self) -> Option<(&str, &str)> {
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ found `fn(&'1 mut LineTuples) -> std::option::Option<(&'1 str, &'1 str)>`
   |
  ::: /home/kisaragi/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:80:5
   |
80 |     fn next(&mut self) -> Option<Self::Item>;
   |     ----------------------------------------- expected `fn(&'1 mut LineTuples) -> std::option::Option<(&'2 str, &'2 str)>`
   |
   = note: expected signature `fn(&'1 mut LineTuples) -> std::option::Option<(&'2 str, &'2 str)>`
              found signature `fn(&'1 mut LineTuples) -> std::option::Option<(&'1 str, &'1 str)>`
   = help: the lifetime requirements from the `impl` do not correspond to the requirements in the `trait`
   = help: verify the lifetime relationships in the `trait` and `impl` between the `self` argument, the other inputs and its output

The pattern of handing out borrows of yourself while iterating is called a "lending iterator", and there is no trait for it in std.

You can make this work by having the iterator not own the data, so it's not handing out borrows of itself.

If you still want the owned version too, an iterator is still possible. Note that collections are generally not their own iterators.

2 Likes

There's no "lifetime issue" in my code. The lifetimes are exactly correct. The code compiles and runs (did you run the Playground?), and the lifetimes are as lenient as possible.

Allowing references with arbitrarily long lifetimes to be created from a local that will eventually be dropped is wrong (ie., unsound). An unbound lifetime, controlled by the caller, can be chosen to be 'static, which is very clearly nonsensical.

Here's a UAF using your code.[1]


When you get a borrow checker error, you should not reach for unsafe. Use it as an opportunity to learn about how the compiler is saving you from unsound code instead.

As a wise sage once said...

It has been

0

days since someone tried and failed to use unsafe to circumvent the lifetime system.

Even experts get lifetime-related unsafe wrong. The chances of getting right while still learning Rust are ~ zero. Leave it alone until you have much more experience.


  1. Miri is under Tools, top-right ↩ī¸Ž

5 Likes

Looks like you implemented Iterator using the partial code in the reply, instead of using the more complete code in the playground that @paramagnetic provided.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.