Extending lifetime to the whole function body


#1

I have a function which builds String’s and collects string slices in some data structure (and they have to be string slices – I cannot change that data structure).

The code looks roughly as following:

fn func() {
  let data = ...;
  for x in xyz {
    let val = format!("hello {}", x); // creating new string
    data.push(val.as_str()); // getting string slice!
  }
  // ...use data somehow...
  // ...it's ok to drop all the strings here -- data will no longer be used
}

Now, this example obviously won’t work as string only lives till the end of the ‘for’ block. I want to extend it’s lifecycle to the whole function.

Now, theoretically, having a vector of strings created at the beginning of the function (to collect those strings) would work as string slices are (as far as I understand) should not move during vector resizes.

However, Rust will not allow me to do that as adding string to vector would require mutable borrow of the vector and string slice will use immutable borrow of the same vector (and, of course, it will be right – certain modifications on this vector, like removal of the element, would invalidate string slice).

Any advices?

So, essentially, I need some “black hole” data structure which I can throw my String into and get reference to the String valid till the end of the data structure itself.

P.S. I guess, I’m wrong about the vector – string slices will be invalidated on vector resizes. Ok, then it could be linked list instead, with append-only operation.

P.P.S Something like that?

fn func() {
  let warehouse = Warehouse::new();
  let data = ...;
  for x in xyz {
    // returns &String with lifetime of Warehouse!
    let val: &String = warehouse.store(format!("hello {}", x));
    data.push(val.as_str());
  }
  // ...use data somehow...
  // ...warehouse is destroyed with all its contents
}

#2

So, essentially, I need some “black hole” data structure which I can throw my String into and get reference to the String valid till the end of the data structure itself.

I think you are on the right track! Something like this may work:

use std::cell::RefCell;

struct Warehouse {
    strings: RefCell<Vec<String>>,
}

impl Warehouse {
    fn new() -> Self {
        Warehouse { strings: RefCell::new(Vec::new()) }
    }

    fn store(&self, s: String) -> &str {
        let mut strings = self.strings.borrow_mut();
        strings.push(s);
        let borrowed = strings.last().unwrap().as_str();

        // Within this method `borrowed` is borrowed from `strings`, but
        // unsafely extend its lifetime to borrow from `self` instead. This is
        // fine because the bytes of the string are not affected by future calls
        // to `store` and it only gets dropped when the black hole is eventually
        // dropped.
        unsafe { &*(borrowed as *const str) }
    }
}

fn main() {
    let warehouse = Warehouse::new();
    let mut data = Vec::new();
    data.push(warehouse.store("a".to_owned()));
    data.push(warehouse.store("b".to_owned()));
    println!("{:?}", data);
}

#4

To make sure I understand correctly, this works only with String / &str, as &str is a pointer to the bytes in the heap + length, right (and it doesn’t point to any of the data stored directly inside String struct)?

However, this is not going to work with fn store(&self, val: T) -> &T {...} as T will be moved around on vector resize? In which case, I guess, I can do a linked list that never moves its nodes around? So this could be a valid use-case for linked list?


#5

Yeah, you’ll need a T that has a stable address of its contents (ie it’s on the heap). You can also force a T to be on the heap by putting it into a Box.

But why can’t you collect all the values upfront? Is it cause you don’t want to iterate twice?


#6

You would need some way to move things to the heap if you need it to work for any T.

struct Warehouse<T> {
    ts: RefCell<Vec<Box<T>>>,
}

impl<T> Warehouse<T> {
    fn store(&self, t: T) -> &T {
        let mut ts = self.ts.borrow_mut();
        let boxed = Box::new(t);
        let ptr = &*boxed as *const T;
        ts.push(boxed);
        unsafe { &*ptr }
    }
}

#7

Yes, I don’t want to iterate twice.

I actually facing this for the second time. In other piece of code I have a need to collect trait objects in the vector (for this guy), so it’s even more complicated, as real data types are different. I haven’t solved this problem for that other case. I guess I need another generalization of this pattern, now for arbitrary T per store: fn store<T>(&self, t: T) -> &T { ... }
:slight_smile:


#8

You’ll need to do the Box thing for arbitrary Ts. You can also do something like a Vec<Vec<T>> where you ensure the inner Vec is never reallocated - then you can put arbitrary Ts straight in there. The outer Vec creates a new inner Vec once some capacity of the inner Vec is reached (ie inner Vecs are of a constant capacity, and new ones are added as needed).


#9

I guess I need another generalization of this pattern, now for arbitrary T per store.

I believe this would be pretty similar to the single T version. You would need a vector of trait objects so that each element knows the appropriate method to drop its corresponding T.

trait Any {}
impl<T> Any for T {}

struct Warehouse {
    ts: RefCell<Vec<Box<Any>>>,
}

impl Warehouse {
    fn store<T>(&self, t: T) -> &T {
        let mut ts = self.ts.borrow_mut();
        let boxed = Box::new(t);
        let ptr = &*boxed as *const T;
        let any = unsafe {
            // A combination of RFC 141 lifetime elision and RFC 1214 implied
            // lifetime bounds ensures that any borrowed data in T outlives self.
            // Written explicitly, the store signature is:
            // fn store<'a, T: 'a>(&'a self, t: T) -> &'a T.
            // That means we can treat the borrowed data in T as being alive as
            // long as we need.
            std::mem::transmute::<Box<Any>, Box<Any + 'static>>(boxed)
        };
        ts.push(any);
        unsafe { &*ptr }
    }
}

#10

Yeah, this is pretty much what I ended up with (although I use linked list to save on allocations for the vector):

use std::cell::Cell;

trait Node {}

type Link = Option<Box<Node>>;

pub struct Warehouse {
  head: Cell<Link>,
}

struct ConcreteNode<T> {
  next: Link,
  elem: T,
}

impl<T> Node for ConcreteNode<T> {}

impl Warehouse {
  pub fn new() -> Self {
    Self {
      head: Cell::new(None),
    }
  }

  pub fn store<T: 'static>(&self, elem: T) -> &T {
    let node = Box::new(ConcreteNode {
      next: self.head.take(),
      elem,
    });

    let result: *const T = &node.as_ref().elem;
    self.head.set(Some(node));
    unsafe { &*result }
  }
}

#[cfg(test)]
mod tests {
  use super::*;
  use std::rc::Rc;
  use std::fmt;
  use std::mem;

  struct Gadget {
    counter: Rc<Cell<u32>>,
    idx: usize,
  }

  impl fmt::Display for Gadget {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
      write!(f, "gadget {}", self.idx)
    }
  }

  impl Drop for Gadget {
    fn drop(&mut self) {
      self.counter.as_ref().set(self.counter.get() + 1);
    }
  }

  #[test]
  pub fn test_warehouse() {
    let counter = Rc::new(Cell::new(0));
    {
      let warehouse = Warehouse::new();
      let mut vec: Vec<&fmt::Display> = Vec::new();
      for idx in 0..3 {
        vec.push(warehouse.store(format!("hello {}", idx)));
        vec.push(warehouse.store(idx));
        vec.push(warehouse.store(Gadget { counter: Rc::clone(&counter), idx }))
      }

      let strings = vec.into_iter().map(ToString::to_string).collect::<Vec<_>>();
      let strings = strings.iter().map(String::as_str).collect::<Vec<_>>();
      let expected = vec!["hello 0", "0", "gadget 0", "hello 1", "1", "gadget 1", "hello 2", "2", "gadget 2"];
      assert_eq!(expected, strings);

      assert_eq!(0, counter.get());
    }
    // All three gadgets should be dropped at that point
    assert_eq!(3, counter.get());
  }
}

#11

If you know a priori max # of items you’ll be putting into the warehouse, you can always allocate a Vec with sufficient capacity and avoid boxes. This will be more efficient than a linked list.


#12

Well, I don’t know the # of items & I need items of different types, too.

In my case, it’s accumulating SQL query parameters while transforming certain query into SQL query. I can, theoretically, run process twice and count everything, but that’s cumbersome.

Actually, I think https://doc.rust-lang.org/1.1.0/arena/struct.Arena.html might be the right tool for the job. It never moves objects, can pre-allocate chunks of memory, optimizes for Drop vs no-Drop data, etc.

P.S. Yeah, it’s pretty much the way AnyArena works:

use any_arena::AnyArena;

pub struct Warehouse {
  arena: AnyArena<'static>
}

impl Warehouse {
  pub fn new() -> Self {
    Self {
      arena: AnyArena::new()
    }
  }

  pub fn store<T: 'static>(&self, elem: T) -> &mut T {
    self.arena.alloc(move || elem)
  }
}

:slight_smile:

Though, AnyArena crate is outdated and does not compile on current Rust nightly (needs certain feature attributes).