Beginner dangling on borrow and dangle (livetime-issue)

Hi all,

I'd like to store some &str in a BTreeSet. The number of elements is unknown in advance. These values to store are not be used anywhere else but inside the set. I still don't get my bloody head around lifetime and borrow and fail with (playgroud):

use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(3);
    for _ in 0..3 {
        und.push('1');
        und.push_str("SA");
        // days.insert(und.as_str());
        days.insert(und.clone().as_str());
    }
    if days.len() == 3 {
        println!("got \'em");
    }
}

The compiler complains that the temporary value from .clone() inside the insert(...) gets dropped and does not live long enough to be taken into account later on with .len().

Not using the .clone() leads to complaints that there are too many borrows. But introducing another individual value via let inside the loop's body creates another dangling reference. Plus in the real use case I do not know in advance how many elements may occur and can not make up single lets for all values that might be.

What's the Rust-way to handle such things?

use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(3);
    for _ in 0..3 {
        und.push('1');
        und.push_str("SA");
        // here you create a new String by calling clone
        // and then create a &str that refers to it by calling as_str
        days.insert(und.clone().as_str());
        // the String is dropped here making the &str inside the BTree invalid
    }
    if days.len() == 3 {
        println!("got \'em");
    }
}

I would suggest you use BTreeSet<String> instead. it works like this.

use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<String>::new();
    let mut und = String::with_capacity(3);
    for _ in 0..3 {
        und.push('1');
        und.push_str("SA");
        days.insert(und.clone());
    }
    if days.len() == 3 {
        println!("got \'em");
    }
}
2 Likes

In Rust, every value has an owner. What you're essentially saying is that the BTreeSet should own the values, in this case, strings.

However, whenever you use a reference (the ampersand & or &mut), you're not dealing with an owned value. A reference always points to a value owned by someone else.

Strings in Rust can be a bit tricky to grasp. Whenever you write a string literal like "some string", two things happen:

  1. The compiler embeds the string somewhere in the final binary. You don’t directly control where or how it's stored (owned).
  2. To access the string, Rust gives you a reference to it, specifically, a &'static str.

This only applies to string literals written directly in the source code. When you construct strings dynamically at runtime, you need to allocate storage yourself, which is where the String type comes in. A String owns its contents, allowing you to modify it by appending characters with push() or other strings with push_str().

Applying this to your case:

  1. The BTreeSet must own the strings:

    let mut days = BTreeSet::<String>::new();
    
  2. Since each loop iteration creates a new string and transfers ownership to the BTreeSet, you need to construct a fresh String each time:

    for _ in 0..3 {
        let mut und = String::with_capacity(3);
        und.push('1');
        und.push_str("SA");
        days.insert(und);
    }
    

Once the loop completes, three String instances will have been created. You briefly owned each one while constructing it, but after inserting them into the BTreeSet, ownership was transferred, making the BTreeSet their new owner.

1 Like

Note that this results in 3 equal strings being added "1SA", which would make the if check at the end fail. (because BTreeSet doesn't add the value if there is already an equal one). The original code creates three different strings: "1SA", "1SA1SA" & "1SA1SA1SA" and adds each of them to the BTreeSet

Argh!

Yes, I overlooked this... :roll_eyes:

But I assume the short initial code was just a small example of the original author to grasp the concepts.

So the alternative would be:

If you don’t create a new String in each iteration, you’ll need to use clone() to make an owned copy of the current working buffer before inserting it into the BTreeSet.

Once the loop completes, the BTreeSet will own the three cloned strings, while the original working buffer will be dropped.

Hey you gentle Rustacians,

thank you for your replies. As I got it working with owned data (String in this case) is not preferable runtime-wise. And in fact the most data that is collected into days will be some &strs already present, owned elsewhere and good to get stored. Only few elements are not already there and need to be created from some pre-existing &str and then get stored into days.

My next fail reads

fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(28);
    for _ in 0..3 {
        und.push('1');
        und.push_str("SA");
        let ending = {
            let pos = &und.len() - 3;
            &und[pos..]
        };
        days.insert(&ending);
        println!("{}", &ending);
    }
    if days.len() == 1 {
        println!("got it");
    }
}

The explanation suggests to change order and start with insert(...) which does not work here. Is there a way to get (immutable) parts of und to be stored in days or is it necessary to switch to all owned data (String) for the set?

In this case, you can reach for Cow<str>, which allows for either some storage living longer than the Cow itself (e.g. 'static), or the owned value.

The issue with this is that you can't modify the String after you took a &str to it. If you can somehow change your program to first create the full String and then take all the references and put them into the BTreeSet that would work.

That sounds a bit like you would benefit from Cow, but the code you showed so far doesn't really.

You preallocate the String with String::with_capacity and could theoretically be sure that the String doesn't reallocate while you push into it. That would allow you to use unsafe to create the &str references while you still push into it. I would advice you to not do that if String cloning or the memory footprint is the bottleneck of your application. It's really easy to get dangling references this way.

Perhaps this is what you're looking for?

use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(28);
    
    let mut lens = [0; 3];
    
    for len in &mut lens {
        und.push('1');
        und.push_str("SA");
        *len = und.len();
    }
    
    for len in lens {
        let ending = {
            let pos = len - 3;
            &und[pos..len]
        };
        days.insert(&ending);
        println!("{}", &ending);
    }
    
    if days.len() == 1 {
        println!("got it");
    }
}

Like others have said, the compiler isn't going to let you store references to a String while pushing into it.

1 Like

Pretty cool code, these two for-loops and the deref *len still is a bit of a mystery but there is another striking remark of yours. My code shall fill a String (und) and was meant to use parts of und that do not change. From the compiler's point of view this referencing still is illegal, because und is mutable. Correct?

When i wrap your sophisticated code into an extended use case it gets me back to the ownership-issue:

use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(28);
    for single_day in ["SU", "MO", "TU", "WE", "TH", "FR", "SA", "SA"] {
        let mut lens = [0; 3];
        for len in &mut lens {
            und.push('1');
            und.push_str(single_day);
            *len = und.len();
        }   
        for len in lens {
            let ending = {
                let pos = len - 3;
                &und[pos..len]
            };
            days.insert(&ending);
            println!("{:?} ends with {}", &days, &ending);
        }

    }
    if days.len() == 7 {
        println!("got it");
    }
}
use std::collections::BTreeSet;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let mut und = String::with_capacity(28);
    let mut lens = Vec::new();
    // push into the String
    for single_day in ["SU", "MO", "TU", "WE", "TH", "FR", "SA", "SA"] {

        und.push('1');
        und.push_str(single_day);
        lens.push(und.len());
    }
    // from here on the String doesn't get modified anymore, so we
    // can create references to it
    for len in lens {
        let ending = {
            let pos = len - 3;
            &und[pos..len]
        };
        days.insert(&ending);
        println!("{:?} ends with {}", &days, &ending);
    }

    if days.len() == 7 {
        println!("got it");
    }
    
    println!("{days:?}");
}

Does this do what you want this code to do?
It puts adds to the String for each day and then puts that part of the string into the BTreeSet. The important part that @emy did with the lens array is to store which part of the string should be put into the BTreeSet. I did that with a Vec, because then the lenght doesn't have to be known at compile time.
This works because you don't modify the String anymore after you've created the first &str to it. All modifications happened before.

Unfortunately the String und should be built and read at the same pass. Here single_day are just a placeholder for some &strs that are passed as a parameter to a surrounding function.

Perhaps I exaggerate the runtime-aspect for your String-set does the trick. And the values in that set are all exactly 3 chars, so creating individual Strings (without ever extending them) shouldn't be that much of a harm, should it?

Yes the runtime impact of cloning the String is probably not significant.

That would even allow you to use a libraby like arrayvec, which doesn't allocate and instead stores the String in an array. There are also libraries that spill to the heap if you push too much into them, like compact_str.

It sounds like you want a string allocator, not an owned string. Allocators typically allow you to create mutable references to new data given a shared reference to the allocator, which gets you past the "double-borrow" issue.

An example using Bumpalo, which I've seen recommended:

use std::collections::BTreeSet;
use bumpalo::Bump;
fn main() {
    let mut days = BTreeSet::<&str>::new();
    let und = Bump::with_capacity(28);
    for _ in 0..3 {
        let ending = und.alloc_str("1SA");
        days.insert(ending);
        println!("{}", ending);
    }
    if days.len() == 1 {
        println!("got it");
    }
}

Implementing your own allocator is also not too difficult.

Edit:
It looks like bumpalo has a growable String abstraction:

use bumpalo::Bump;
use std::collections::BTreeSet;
use bumpalo::collections::String;

fn main() {
    let mut days = BTreeSet::<&str>::new();
    let und = Bump::with_capacity(28);
    for _ in 0..3 {
        let mut ending = String::with_capacity_in(3, &und);
        ending.push('1');
        ending.push_str("SA");
        let ending = ending.into_bump_str();
        days.insert(ending);
        println!("{}", ending);
    }
    if days.len() == 1 {
        println!("got it");
    }
}