Save two fields with reference cycles in a struct

Hello,

I have a practical question which also happens to be conceptual and for whych I am having trouble finding a solution.

It will be easier to explain with a simple piece of code which will not compile:

use testcontainers::clients::Cli;
use testcontainers::images::postgres::Postgres;
use testcontainers::RunnableImage;

pub type DockerClient = Cli;

pub struct Database<'a> {
    container: testcontainers::Container<'a, Postgres>,
    docker_client: DockerClient,
}

impl<'a> Database<'a> {
    pub fn run() -> Self {
        let postgres_image = RunnableImage::from(Postgres::default())
            .with_tag("13")
            .with_env_var(("POSTGRES_DB", "test_database"))
            .with_env_var(("POSTGRES_USER", "postgres"))
            .with_env_var(("POSTGRES_PASSWORD", "postgres"))
            .with_mapped_port((5433, 5432));
        let docker_client = DockerClient::default();
        let container = docker_client.run(postgres_image);

        Self {
            container,
            docker_client,
        }
    }
}

fn main() {
    println!("hello");
}

The only dependency to run this code is

testcontainers = "0.13.0"

Why doesn't this compile

Based on my understanding, this does not compile because the method run() on the DockerClient creates a container which takes a reference to the docker_client instance on which the run method is called.

So we have a cycle here, stating that the DockerClient which creates a container must live at least the same time or longer than the created container, because if it dies first then the created container will have a dangling reference.

Ok, so far so good.

The problem

But in my code, I am saving both the DockerClient and the created container on the Database struct, effectively coupling the life of these two pieces of data together. When Database goes out of scope, both (the client and the created container) will be dropped. So, in my opinion, the created container will not have a dangling reference, because the Database struct instance owns both the client and the container and will drop these together.

But how do I express this to the compiler so that it will understand me? Because the way things are now I got an compilation error stating that I am trying to return a data owned by the function and I am, but at the same time I am offering (or at least so is my goal) guarantees that this owned data is being moved somewhere where the owning data will haver access to it.

Thanks,
Marlon

There are crates which allow you to create self-referencing structs safely. The most popular one is ouroboros; using it, your example would look like this:

use ouroboros::self_referencing;
use testcontainers::{clients::Cli, images::postgres::Postgres, Container, RunnableImage};

pub type DockerClient = Cli;

#[self_referencing]
pub struct Database {
    docker_client: DockerClient,
    #[borrows(docker_client)]
    #[covariant]
    container: Container<'this, Postgres>,
}

impl Database {
    pub fn run() -> Self {
        let postgres_image = RunnableImage::from(Postgres::default())
            .with_tag("13")
            .with_env_var(("POSTGRES_DB", "test_database"))
            .with_env_var(("POSTGRES_USER", "postgres"))
            .with_env_var(("POSTGRES_PASSWORD", "postgres"))
            .with_mapped_port((5433, 5432));

        DatabaseBuilder {
            docker_client: DockerClient::default(),
            container_builder: |docker_client| docker_client.run(postgres_image),
        }
        .build()
    }

    // Example method.
    pub fn container(&self) -> &Container<Postgres> {
        self.borrow_container()
    }
}
1 Like

Going out of scope is not the only operation that can invalidate references. All Rust objects are expected to be relocatable without breaking them. You might want to move your Database instance into an Option<Database>, for instance; this will change its address and invalidate any internal self-references.

The only way to have this sort of cyclical reference in safe Rust is with shared ownership, i.e. Rc or Arc.

3 Likes

Hello,

Can you please take a look at the implementation on the testcontainers?

I ask because problem here doesn't seem to be exactly a cyclic reference im terms of memory, but ratter in terms of lifetimes.

May be I didn't explain myself clearly, but what it seems to require me is that I need to declare a lifetime in testcontainers::Container that is expected to match with the lifetime of the docker_client, because of a phantomData.

As far as I can understand, there are no references but there is a requirement that both objects should be compatible in terms of lifetimes, and I don't know how to express that.

While the question is in progress, I am investigating from my side and trying to understand better what is going on. My understanding though is that lifetimes are being forced but no *** real *** references are being held.

Hello,

As I do not control the testContainer implementation, do you see a way of solving this problem using rc or arc?

I tried to do it before asking here, but did not manage to solve it, assuming that I cannot change the way testContainers is implemented.

I would use heap if I could to make sure that moving objects around wouldn't change addresses, but still couldn't manage it.

The ouroboros solution above uses the heap to store docker_client under the hood. Is there any reason it doesn't work for you?

It did work for me, was about to post a note here.

Thank you for helping me!

However, I still am trying to figureout why *** obviously I tried lots of things before coming here and failed to achieve the goal, ence I came here.

It is important for me to understand what is going on because I am trying to become a better rust programmer and go beyond the hello world stuff.

I still cannot understand why is all this needed if cross references aren't being kept between the two objects, only a phantomdata forcing them to correlate in terms of lifetimes.

In the other hand, cargo expand came with lots lots of stuff that I am trying to drilldown and understand.

I think the part you are missing is that each time you move an object (by passing it into/from a function, returning from a code block or assigning to a new variable), you literally move it in memory. Semantically, each move is a bitwise copy of the object's memory to a new location, with the old location being marked as no longer usable.

When you create a reference to a local variable x, you cannot move x anywhere without invalidating the reference first. Passing x into a struct literal would involve moving it to a new location, invalidating the old reference. So would returning the resulting struct.

Overall, it means that, without using unsafe, it is impossible to return self-referential structs from functions, and even creating them is an issue (you need to turn the self-referential fields into Options, initialize the struct with those fields set to None, and set them to reference the fields afterwards).

1 Like

Have you considered simply changing the function parameters to dependency-inject the Client?

1 Like

Are you saying that ...

struct A {
  b: i32,
}

impl A {
fn new() -> Self {
  let b = 1; // b is in a given address in memory
  Self {
    b: b // now b has been moved to another address, because it is being returned within the struct
  }
}
}

Is this right?

If so, then can I assume that a variable might perfectly be moved without the original one going out of scope?

This is interesting. In c, if I return a struct from a function it is copied, so the original one goes out of scope while the copied one is returned:

struct A {
  int b;
}

A make_a() {
  A ret;
  ret.b = 3;
return ret;
}

void main() {
  A c = make_a();
}

The ret variable would be copied, which means that the default destructor on it would run and a new copy of it would be placed in c;

as far as I am understanding now, rust would move instead of copy, which means that although the memory is *** kind of *** copied, the *** destructor *** will not be called because the returning variable is not considered to go out of scope.

I read that in rust stuff were moved, but thought that the move was only semantic, because as *** destructors *** are not called I was lead to thinkk that variables were kind of pinned to a given memory position and would never move, only onwership would be passed from function to function.

Thinking that data was by default pinned in memory, it made perfect sense to be confused the way I was in this regard.

1 Like

What if you pass in the docker client into the run() function as a mutable reference? That might allow you to save the mutable reference in the struct.

I did.

It however would force users of the library to link to testcontainers itself, because a client would need to be injected. The goal of encapsulating this functionality was exactly to decouple the users from this dependency.

It is also an exercise to help me to understand things about how the language works.

Hello,

This would work but I don't want users of the library to create a client. I want them to call the run function and be sure that a nice container with postgres instance is running, without them having to add testcontainer to their dependencies.

Yes, that is the root of confusion. Rust takes the opposite stance: data is never pinned in memory! Every type must be ready to be blindly memcopied to an arbitrary new location at any point in time.

The borrow checker guarantees that the data will not be moved while there are live references, but otherwise all bets are off. If you want data to be immovable, you must put it behind some pointer (e.g. heap-allocate it with Box::new, or work with a stack-allocated object exclusively through a reference).

If you meditate on that statement, you will realise that you are saying the same thing, really. Once you move the data semantically, you may not be allowed to use the old data in any way. In particular, all existing pointers must be invalidated at this point. Thus nothing can observe the change of memory location, so why wouldn't it change? Some operations would obviously require it, like passing data into/from a stack frame.

In practice, LLVM will indeed elide most copies in the release build, but they are present in the debug build, and always exist at the level of operational semantics.

You could use a "builder" pattern, where the client could use an API to state "this is the default docker client I want", then when the build() method is called all required fields are ready to be injected from the builder struct into the database struct.

The API of the method in question is

pub fn run<I: Image>(&self, /* ... */) -> Container<'_, I>

That API expresses that the borrow of the self parameter extends through the lifetime of the returned Container. It doesn't matter that no actual reference is held, the semantic borrow still exists. It doesn't matter that there are no actual references; moreover, the implementation could change to use an actual reference without changing the API.

All that is to say, the semantics of borrowing are more general than that of references. A generalized statement of "moving a struct invalidates references to it" would be "using a struct (which includes moves) invalidates borrows of it."


When faced with a situation where you wish you could return a borrow of a local variable, another pattern you can use is to take a callback instead:

pub struct Database<'a> {
    container: testcontainers::Container<'a, Postgres>,
    // docker_client: DockerClient,
}

pub fn run_with_database<R, F: FnOnce(Database<'_>) -> R>(f: F) -> R {
    let postgres_image = /* ... */;
    let docker_client = DockerClient::default();
    let container = docker_client.run(postgres_image);
    let database = Database { container };

    f(database)
}

// Example usage
fn elsewhere() -> Result<i32, ()> {
    run_with_database(|database| {
        // Call the rest of your program or use `database` directly...
        Ok(0)
    })
}
2 Likes

...

That API expresses that the borrow of the self parameter extends through the lifetime of the returned Container.

Can you explain me why? I don't question this is true, but I wouldn't conclude this myself without help.

in other words, what syntactical cue did you use to conclude that the &self is being bound to the lifetime of the returned container?

I ask this in order to be able to conclude this myself in future cases. In other words, I understood instinctively that that was the case, but if you ask me why I won't at this time be able to explain.

The function signature utilizes lifetime elision. Here, since Container<'_, I> uses an elided ('_) lifetime, and since the method takes an &self parameter, the elision rules say that it expands to the lifetime of &self. Therefore, it is equivalent to the following signature:

pub fn run<'a, I: Image>(&'a self, /* ... */) -> Container<'a, I>
1 Like