Rust and imported memory

I am using Rust in a WASM context (in a browser). The memory is created in JS and provided to the environment. I am using link-args=--import-memory as a rustflag. This means that Rust does not manage memory but all memory allocations are performed by an Allocator that ends up calling a function on the JS side that allocates memory and returns an index into the memory segment.

At some point, in Rust, I return a statically allocated instance, something like that:

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
  Instance { /* somefields*/ };
}

Then, on JS side, I treat the number returned as some sort of handle. When I want to act on that instance, I call a Rust function that do this:

#[no_mangle]
pub extern "C" fn do_something(instance_ptr: *const Instance) -> something {
  let instance: &Instance = unsafe { instance_ptr.as_ref().unwrap() };
  [...]

Now that works the first time I call into that function. Things seems to go smoothly. But the second time I call into it, things go pear shaped. In particular, the size of a buffer that supposed to be constant is changing and crashes ensues.

Now is there anything wrong in my approach? At some point I though that maybe rust was trying to deallocate the instance struct so I tried to use a combination of core::mem::forget calls but that does not seems to change anything.

Any help is appreciated.

You're hiding what you're actually returning, which makes a huge difference. e.g. this compiles, but returns a dangling pointer:

pub struct Instance {}

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
    &Instance {} as *const Instance
}

No. There are no unsafe in that code which means it must be safe. And, indeed, it would return reference to statically allocated global.

But yes, precise description of what goes on in code is important.

In particular it's entirely not clear why anyone would allocate static object which could never be changed and then invent some strange dance to pass it from one place to another… I'm pretty sure that at some point there are an attempt to convert shared reference into unique one… and that never ends well…

Because from developer POV and compiler POV this change look radically differently.

Developer: I'm so cool, I made that code compile… just a little unsafe and borrows checker is appeased.:grin::sunglasses::grin:

Compiler: ooh, someone transmuted shared reference into unqiue one… how lovelely😇. Not sure why special function wasn't used, but that's nice opportunity to cut out lots of useless code… sky is the limit, let's do it:hammer_and_wrench::carpentry_saw::smiling_imp:

That's why Rustonomicon says that not once but thrice:

  • Transmuting an & to &mut is always Undefined Behavior.
  • No you can't do it.
  • No you're not special.

Sorry, you are right, that's not what I am returning:

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
  let instance = Instance { /* somefields */ };
  core::ptr::addr_of!(instance)
}
1 Like

Correct. The function I wrote is safe even though it returns a dangling pointer.

No. There are no globals or statics in the function I wrote.

Rust doesn't care if pointers dangle or if they were never valid to begin with, until the point where unsafe code cares:

let instance: &Instance = unsafe { instance_ptr.as_ref().unwrap() };

unsafe is the Rust way of saying "trust me compiler: this code is sound". It isn't sound when the pointer isn't valid, but the compiler is trusting that you already took care of that. There are additional rules you're promising the compiler you're enforcing; there's a book covering many of them.

instance lives on the stack. It's not a static. You're returning a dangling pointer. To get a static, you must use the static keyword.

I wish compilers were all-knowing... Instead what happens is this (in addition to many other cases):

  • Code: load x.y, where x is owned, or a &, or a &mut.
  • Code: call some function f, but don't hand it a &mut to x.
  • Code: load x.y.
  • Compiler: remove the redundant load since x can't possibly change.
2 Likes

Of course there. Test for yourself:

pub struct Instance {}

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
    &Instance {} as &'static Instance as *const Instance
}

How could compiler produce &static Instance if there are not statics in your function? You forgot about static promotion

Yes. But pointers to statics never dangle.

No. There are no need. If your type doesn't have interior mutability it would be static.

Compiler assumes that your program obeys certain rules. If it finds that they are violated in some cases it's powerful signal that such code wouldn't ever be reached. Rust even includes special functions which is supposed to be never called which works similarly to detection of any other “impossible condition”.

That's insta-UB too. instance is a local variable that doesn't survive the function call.

2 Likes

This would depend on what fields said variable includes, isn't it? If all values are statically known constants then it would only be local if you wouldn't take it's address.

That's why @tbfleming code is correct but have no idea about @LukeSkyw code.

Whoops. I hate that rule; it ruins examples meant to teach other concepts.

Yep :frowning:

No, that's false. I'm talking about the address of the struct itself, not the fields. A let declaration always results in a local. It doesn't ever get static-promoted. Therefore, returning its address, &local_created_with_let, always yields a dangling pointer. And that's 100% what OP's code is doing. It creates a local, by-value binding, and returns its address. That's wrong, no caveats apply.

Miri demo.

3 Likes

A quick simplification for OP: even though there does exist a rule to get implicit statics, it's loaded with caveats that you have to hit exactly, and your code didn't. Depending on situation, you might be able to adjust your code to hit those, but it would silently fail again in the future if someone tweaked it back out. If you really need statics, it will be less fragile to explicitly use the static keyword to define them.

Ooh. I see. Code presented in first message is Ok, “clarified code” later is not Ok.

I wonder how much that code is removed from what's actually in his (or her?) program.

So my understanding of the problem is that, as I am returning a pointer, the instance is not "moved" it is then destroyed as soon as the function scope ends.

Static is not a solution for me, I am going to call that function multiple times so I guess, I should be using something like Box::new for dynamic allocation.

Yes I transformed a bit my code to try to narrow it down a little. Now that I have a better understanding of the problem I guess my issue is how to deal with dynamic allocation in Rust.

I have a "pure" Rust function that allocates and returns an instance:

impl Instance {
  pub fn new(/* somearguments */) -> Result<Self, SomeError> {
    Instance { /* somefields */ }
  }
}

My understanding here is that it is correct because the variable is "moved" so stay alive after the function call.

Then I have an interface with WASM which tried to return the address of that instance:

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
  let instance = Instance::new(/* someparams */);
  core::ptr::addr_of!(instance)
}

But as the address is returned, the variable is not moved and then destroyed. I need to find a way to tell rust that it does not own that variable anymore and should let it live after the call.

Note that all this lead so a strange UB where the variable is still valid for exactly one function call after that, which got me losing my mind a little :slight_smile:

I'd like not to change the constructor itself, which is used in a non-WASM context. That's why I tried using core::mem::forget in create_instance to no avail for now.

Yes. You can think of it as moving from the constructor's stack frame to the caller's stack frame (I'm leaving out ABI and optimization transforms).

The problem is that it lives on the stack. Related:

This is a side effect of the 2 stacks in use. Since the system stack doesn't live in linear memory, items on it don't have accessible addresses. Rustc creates a second stack in linear memory for things that either can't fit in the constraints of the system stack, or things that need pointers or references to them when the compiler is unable to optimize away their use. The JS engine (V8, spider monkey, etc) isn't aware of the 2nd stack.

  1. js calls create_instance(). The object is on the 2nd stack. The stack shrinks when the function returns, but doesn't happen to get overwritten.
  2. Calls to other Rust functions may or may not overwrite that area, depending on how much secondary stack they use.
  3. A second call to create_instance reuses that space.

You need to move the object off the stack. Most rust code uses Box and Rc to place objects on the heap when needed.

1 Like

Ok thanks to your guidance and this reference (which I should have read earlier :slight_smile: ) I came up with this, which seems to work for now:

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
  let instance = Instance::new(/* someparams */);
  let mut pInstance = alloc::boxed::Box::new(instance);
  &mut *pInstance
}

Thanks again for your help.

You don't have to dynamically allocate just because you will call the function many times. It would be perfectly fine to return the address of a static from the function.

Now if you want subsequent calls to return distinct objects, then yes, you'll need to use dynamic allocation. Try Box::into_raw(), which gets the buffer pointer of a box, and consumes the box without deallocating it. Ie., it effectively transfers ownership to the raw pointer.

Naturally, this means that there has to be a Box::from_raw() paired with it somewhere, so as not to leak the allocated memory (since raw pointers don't deallocate themselves when going out of scope).

2 Likes

That's wrong, too. The box goes out of scope at the end of the function and deallocates its buffer. It has exactly the same problem as your original code had, except that the dangling pointer now points to the heap instead of the stack. But it's still dangling and still UB. Miri demo.

4 Likes

Alright, I am getting there:

#[no_mangle]
pub extern "C" fn create_instance() -> *const Instance {
  let instance = Instance::new(/* someparams */);
  let p_instance = alloc::boxed::Box::new(instance);
  alloc::boxed::Box::into_raw(p_instance)
}

Yes, and now you'll need a corresponding destroy_instance() method with Box::into_raw().

Also, you'll need to think very carefully about how you use this whole machinery from JavaScript. You can't just call these functions in arbitrary ways, because that could lead to double-free errors, for example. So you probably want to abstract this whole thing away within a JS function or class.