Lifetime issue when reading file as String, split into words

I read a file with fs::read_to_string then split it up into words. I can hold those words as a Vec<&str> where those &strs are borrows from the String that I got from read_to_string. I want to create a struct with those words, but I can't figure out how to express the lifetimes.

Right now I have something like the code below, where a constructor method loads a file. I've tried putting the original, owned, String into struct Words, but it doesn't compile. I didn't really expect it to compile, but I can't tell you why!

Is there a way to fix this code, or is there a better way to structure this logic? Thanks!

use std::fs;

struct Words<'a> {
    // A raw string with words delimited by whitespace.
    string: String,
    // Vec of those words, each borrowed from `string`. How do I tell the
    // borrow checker that the lifetime 'a is bound to `string`?
    words: Vec<&'a str>,
}

impl<'a> Words<'a> {
    fn load() -> Self {
        let string = fs::read_to_string("/some/file.txt").unwrap();
        let words = string.split_whitespace().collect();
        Words { string, words }
    }
}

fn main() {
    let _words = Words::load();
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0515]: cannot return value referencing local variable `string`
  --> src/main.rs:15:9
   |
14 |         let words = string.split_whitespace().collect();
   |                     ------ `string` is borrowed here
15 |         Words { string, words }
   |         ^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `string` because it is borrowed
  --> src/main.rs:15:17
   |
11 | impl<'a> Words<'a> {
   |      -- lifetime `'a` defined here
...
14 |         let words = string.split_whitespace().collect();
   |                     ------ borrow of `string` occurs here
15 |         Words { string, words }
   |         --------^^^^^^---------
   |         |       |
   |         |       move out of `string` occurs here
   |         returning this value requires that `string` is borrowed for `'a`

error: aborting due to 2 previous errors

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.
error: could not compile `playground`.

To learn more, run the command again with --verbose.

You are trying to create a self-referential struct. This is not possible.

You can consider converting each word to a String and creating a Vec<String> instead.

4 Likes

Another option is to separate the String from the list of word-slices.

You would need to adjust your design a bit to load the String first separately and pass it to a function like Words::new(s: &str) -> Words, where the Words struct only contains the vector (and maybe a slice of the whole string).

Ah ha, thanks, that makes sense.

I don't really need to access the original string. If I drop that, I can write a version that works:

use std::fs;

struct Words {
    words: Vec<String>,
}

impl Words {
    fn load() -> Self {
        let string = fs::read_to_string("/some/file.txt").unwrap();
        let words = string.split_whitespace().map(str::to_string).collect();
        Words { words }
    }
}

(Playground)

However, I think this will end up doing many clones (via str::to_string). As such it's going to be inefficient, especially with larger files. Is there a way to restructure this program such that I don't need to clone the String that I first read with fs::read_to_string?

You can keep ownership of the original string outside Words. The Words struct will then be a borrow of that original string.

2 Likes

To elaborate on @alice's comment:

struct Words<'src> {
  words: Vec<&'src str>,
}

impl<'src> Words<'src> {
    pub fn load(src: &'src str) -> Words<'src> {
        Words { words: src.split_whitespace().collect() }
    }
}

fn main() {
    let src = std::fs::read_to_string("/some/file.txt").unwrap();
    let words = Words::load(&src);
}

So you're splitting the process into two steps. In the first step you load the text into memory, then in the second step you split it by whitespace and store pointers to each word in a Vec.

The reason your original code failed is because when Words::load() returns it'll move Words from the local stack frame to the caller. Because words contains pointers to the string inside the returned Words, this could accidentally invalidate the word pointers (rustc can't know a String's contents don't actually move when you move the String) so the borrow checker complains.

Generally, when you encounter these sorts of borrowing problems you can solve them by

  1. Creating copies (the to_string() calls)
  2. Hoist the item out of your Words type and make the caller pass in a reference to it instead (the snippet shown above)

I would normally go for the first approach unless I know ahead of time that the code is performance critical. Allocating lots of little strings isn't really that expensive, especially when it's a once-off thing or your application is already doing IO.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.