Storing structs with their borrowers?

I'm trying to write a wrapper around gdal::vector code, so that the caller doesn't have to deal with all these intermediate objects, and just iterate over, say, features.

Here's the code that I want to wrap (a simplified version of this gdal code from the docs):

let dataset = Dataset::open("fixtures/roads.geojson")?;
let mut layer = dataset.layer(0)?;
for feature in layer.features() { // returns FeatureIterator
    ...

Now I wrote this, and it won't compile, because Dataset is borrowed by Layer, and Layer is borrowed by FeatureIterator, and they move when the function returns.

struct GpkgDriver<'a> {
	dataset: Dataset,
    layer: Layer<'a>,
	fi: FeatureIterator<'a>,
}

impl<'a> GpkgDriver<'a> {
	fn from_path(path: &str) -> Result<Self, Box<dyn Error>> {
		let dataset = Dataset::open(path)?;
		let mut layer = dataset.layer(0)?;
		let fi = layer.features();
		Ok(Self { dataset, layer, fi })
	}
}

In the original example in the docs, dataset and layer live in main() and essentially are 'static, but I'd like not to abuse this.

I've checked the docs, and in case of GDAL library, there's no alternative like into_owned....

Wrapping dataset into Arc or Box doesn't resolve the issue. And I need an iterator, so no matter if I make a big function or closure, eventually dataset will be returned and moved.

What is a reasonable workaround for this, so that the outer user don't have to keep dataset and layer?

If they live in main then they are not 'static, they would be destroyed when main would return.

IOW: you want to write some-other-language-in-Rust, not Rust. That's not impossible, but Rust would fight you tooth and nail.

The usual approach is Rust is to accept closure (which would deal with features) and drive access to layers and features in your code.

If you want to do something else then you have to think about what you actually try to do.

How and why would code which deals with features but have no idea about dataset and layer exist? What would be the purpose? What's the end goal? “Business goal”, understandable by layman, not this is how I would write code in C++ (Java, Python or Pascal).

Storing the borrows with the borrowee is a "self-referential struct" [1], which is not a pattern that plays nice with Rust's model. Various crates try, but it's hard to do so soundly.

Looking at the docs, you probably want Dataset :arrow_right: OwnedLayer :arrow_right: OwnedFeatureIterator.

// (untested)
struct GpkgDriver(OwnedFeatureIterator);
// ...
impl GpkgDriver {
    fn new(path: &str) -> Result<Self, Box<dyn Error>> {
        Dataset::open(path)?
            .into_layer(0)?
            .owned_features()
            .map(GpkgDriver)
    }
    // `impl<'a> IntoIterator for &'a mut GpkgDriver` maybe
    fn iter(&mut self) -> impl Iterator<Item = Feature<'_>> + '_ {
        &mut self.0
    }
}
// ...
fn foo() -> Result<(), Box<dyn Error>> {
    for _ in GpkgDriver::new("/etc/shadow")?.iter() {
    }
    Ok(())
}

  1. you can search for that phrase to find a plethora of other threads ↩︎

1 Like

Oh, awesome! I had seen these two structs before but forgot this time.

I’m mostly thinking out loud because the issue of self referencing data types is a popular Rust ”gotcha”. Why so?

Because we can do so in other languages… why so in Rust?

Safe refs imply a clear cut sequence for instantiating and destruction of memory in order to avoid having a ref point to invalid memory. This is an absolute. E.g., in order to instantiate Wrap { value: T }, T must be instantiated first, and destroyed last [1]. My point relates to a borrow if you blur your eyes to see how the & in &T is Wrap in the example.

The first bit of code in your post follows a sequence driven by “this borrows that, which borrows the other” [2]. As soon as you wrap them together in GpgkDriver the sequencing is no longer possible without introducing the possibility of having one of GpgkDriver fields pointing to invalid memory.

In C, when using self-referencing types, I relied on “I must avoid reading the memory if/when…”. That need to trust myself goes away with the reliance on the Rust compiler that simply avoids the issue by preventing the app from compiling [3].

Finally, it might seem a bit counter-intuitive, if not ironic, that by introducing more structure (per GpgDriver) I prevent the ability to have any field depend on another in that same struct. True however because the structure introduces a need to instantiate all fields simultaneously which can’t happen where an explicit sequencing is introduced (i.e when “this borrows that”, “this field depends on that field”). This pattern of course explains the need in Rust to have two separate structures to enable iteration (MyCollection cannot iterate over itself [4]). Or the need to copy a value if it’s needed by multiple fields (thus breaking the inter-dependence).


  1. I wonder if this applies to why Rust will cast &Vec<T> <-> Vec<&T>, as needed to “make your app work” without manually having to do so… one type implies the other in order to make sense ↩︎

  2. And because Rust disallows declaring without instantiating ↩︎

  3. there may be a safe way to run a self-ref struct… in time the Rust compiler might figure out how to differentiate accordingly, but not as of yet ↩︎

  4. this point is obfuscated with tutorials that use a generator and call it an Iterator ↩︎

1 Like

I guess in my mind referencing still works like in Python, where all variables are references to data in reference-counted heap storage.

So the feeling was telling me that borrowing some data is ok as long as it's stored somewhere, whereas in reality a reference points at one ephemeral cell on the stack. It is sort of "more brittle" than I perceived.

I've actually never seen this in the docs told explicitly: the Rust Book mostly elaborates on ownership, while in the official reference I even can't find a sub-section on references or borrowing.

It's not counter-intuitive and not ironic. If you read rules from Google's C++ style guide you would see almost the same rules as in any Rust program. And even if you go back and look on what it was saying years ago you would see the following:

Generally speaking, we prefer that we design code with clear object ownership. The clearest object ownership is obtained by using an object directly as a field or local variable, without using pointers at all. On the other extreme, by their very definition, reference counted pointers are owned by nobody. The problem with this design is that it is easy to create circular references or other strange conditions that cause an object to never be deleted. It is also slow to perform atomic operations every time a value is copied or assigned.
Although they are not recommended, reference counted pointers are sometimes the simplest and most elegant way to solve a problem.

That's year 2008, when Rust was just a a hobby project, not endorsed by Mozilla yet!

Yes. And while sometimes you do that for efficiency it's always a hassle. That's why people in C end up with the exact same rules as Rust (+ few small exceptions where we may cheat a tiny bit).

I guess it's because most Rust developers (at least initially) were C++ users. Without GC you can not make memory management S.E.P.. You have to deal with ownership immediately and the consequences hit you immediately.

You can create complicated constructs which would lead to hours of debugging, but if you work with huge codebases sooner or later you stop doing that: tiny increase in efficiency is just not compensated by hours of debugging.

Thus usually you try to use unique_ptr as much as possible and have “chain-of-trust” which goes to the main, ultimately.

1 Like

When you’re right, you’re right. Great references. Thank you.