Am I properly handling struct that may contain any other data?

I'm trying to build an app with modular UI using iced with support for plugins. Because of how iced handles passing messages between UI elements I need a enum that represents all possible messages. Because it's very hard to do when your program should have support for plugins, I just decided to make this CustomData struct that converts any type into a raw pointer and back, but with safe API and some required traits.

It uses unsafe rust, so I would like to know is everything right with the implementation. Specifically, if it leaks memory, as you theoretically can still access memory that was deallocated, though it would not be valid.

Here is this struct
Playground

#[derive(Debug, Clone)]
pub struct CustomData {
    pointer: *const u8,
    layout: std::alloc::Layout,
}

impl CustomData {

    pub fn new<T>(data: T) -> Self 
    where T: Send + Clone,
    {
        let leaked = Box::leak(Box::new(data));
        let ret = Self {
            pointer: leaked as *const T as *const u8,
            layout: std::alloc::Layout::new::<T>(),
        };
        ret
    }

    pub fn get_content<T>(&self) -> Result<T, ()>
    where T: Send + Clone,
    {
        if self.layout != std::alloc::Layout::new::<T>() {
            Err(())
        }else{
            Ok(unsafe {(*self.pointer.cast::<T>()).clone() })
        }
    }

}

impl Drop for CustomData {
    fn drop(&mut self) {
        unsafe {std::alloc::dealloc(self.pointer as *mut u8, self.layout)};
    }
}

unsafe impl Send for CustomData {}

And also the code I use to prove that deallocated memory is still accessable but filled with random numbers (on my PC the second time it prints some random value, but on rust playground it always prints 0)
Playground

use std::alloc::dealloc;
use std::alloc::Layout;

fn main() {
    let data = MyStruct { data: 42 };
    let ptr = Box::leak(Box::new(data)) as *const MyStruct as *mut u8;
    unsafe{println!("{}", *ptr)};
    unsafe {
        dealloc(ptr, Layout::new::<i32>());
    }
    unsafe{println!("{}", *ptr)};
}

#[derive(Debug)]
struct MyStruct {
    data: i32
}

That's unsound? It can easily cause UB: Rust Playground, also in other ways.

2 Likes

Your first stop should be writing as many tests as you can think of and running them with Miri. You can also run Miri in the playground under Tools, top-right.

For example Miri detects UB in both of the playgrounds of your OP.

Reading deallocated memory is UB. The program may sometimes continue to run in a way that looks like it's doing something sensible, even when it's not doing anything sensible. The values observed don't necessarily correspond to the values in memory, for example.

3 Likes

So, I've added type_id to check at runtime that the types of content and output are matching.
Test from the previous comment now just return Err.

Thanks for telling me about Miri. I cannot really come up with many tests for this case but I would know about it's existence : )

#[derive(Debug, Clone)]
pub struct CustomData {
    pointer: *const u8,
    layout: std::alloc::Layout,
    type_id: std::any::TypeId,
}

impl CustomData {

    pub fn new<T>(data: T) -> Self 
    where T: Send + Clone + Any,
    {
        let leaked = Box::leak(Box::new(data));
        let ret = Self {
            pointer: leaked as *const T as *const u8,
            layout: std::alloc::Layout::new::<T>(),
            type_id: std::any::TypeId::of::<T>(),
        };
        ret
    }

    pub fn get_content<T>(&self) -> Result<T, ()>
    where T: Send + Clone + Any,
    {
        if self.layout != std::alloc::Layout::new::<T>() || 
                self.type_id != std::any::TypeId::of::<T>() {
            Err(())
        }else{
            Ok(unsafe {(*self.pointer.cast::<T>()).clone() })
        }
    }

}

impl Drop for CustomData {
    fn drop(&mut self) {
        unsafe {std::alloc::dealloc(self.pointer as *mut u8, self.layout)};
    }
}

unsafe impl Send for CustomData {}

From a quick glance I can see:

  • in get_content it's not enough to check the layout. usize and &T have the same layout for example, but it's obviously not safe to convert a usize to a &T; didn't see this already got addressed;

  • your Drop is deallocating the memory of the Box but is not dropping the T, leaking whatever resource it was managing;

  • deriving Clone is wrong, as it will just copy the pointer inside your struct; when the clones will be dropped they will try to free the memory twice, leading to double free and use after free issues;

  • requiring T: Send is not enough to implement Send for CustomData, as you may clone it, send it to another thread and then call get_content: this will call .clone() on a &T from another thread, but you didn't require T: Sync.

What's wrong with a Arc<dyn Any + Send + Sync>?

7 Likes

Here's another safe solution using Any (Box<dyn Data> delivers most of what you want):

use std::any::Any;

pub trait Data: Any + Send {
    fn clone_any(&self) -> Box<dyn Data>;
    fn as_any(&self) -> &dyn Any;
}

impl Clone for Box<dyn Data> {
    fn clone(&self) -> Self {
        (**self).clone_any()
    }
}

impl<T: Any + Send + Clone> Data for T {
    fn clone_any(&self) -> Box<dyn Data> {
        Box::new(self.clone())
    }

    fn as_any(&self) -> &dyn Any {
        self
    }
}

impl dyn Data {
    pub fn get_content<T: Send + Clone + 'static>(&self) -> Option<T> {
        self.as_any().downcast_ref().cloned()
    }
}

fn is_send<T: Send>(_: &T) {}

#[derive(Clone, Debug)]
struct MyStruct {
    data: i32,
}

fn main() {
    let data: Box<dyn Data> = Box::new(MyStruct { data: 42 });
    is_send(&data);
    let data2 = data.clone();
    println!("{:?}", data.get_content::<MyStruct>());
    println!("{:?}", data2.get_content::<MyStruct>());
}

(playground)

The Arc solution is better if you want lightweight clones though.

2 Likes

Yeah, using Arc was a good idea. It greatly reduced the code and should work just fine.

Now the code looks like this


#[derive(Debug, Clone)]
pub struct CustomData {
    arc: Arc<dyn Any + Send + Sync>,
}

use std::sync::Arc;

impl CustomData {

    pub fn new<T>(data: T) -> Self
    where T: Send + Sync + Clone + Any,
    {
        let ret = Self {
            arc: Arc::new(data),
        };
        ret
    }

    pub fn get_content<T>(self) -> Option<T>
    where T: Send + Sync + Clone + Any,
    {
            self.arc.downcast::<T>().ok().map(|x| x.as_ref().clone())
    }

}

But I didn't really get what do you mean about Drop implementation. There's no T nor Box in on the 'self', so I cannot deallocate them.

He meant this:

This deallocates the memory pointed by the pointer, but it doesn't call the destructor of the pointed value.


In general, the recommended usage of unsafe is "Don't".

The whole point of the language is that you don't need unsafe. (Unless you are interacting with external C code, which is necessarily unsafe).

Unsafe is a very advanced, very sharp tool that the average programmer simply shouldn't use. In particular, you shouldn't use it for:

  • circumventing the type system (eg. transmuting or lifetime gymnastics) because you think you know better than the compiler (you don't);
  • "performance" or "optimization" (eg. assuming user-controlled bytes are UTF-8 or cheating with synchronization in threading — again, you don't know better than the compiler);
  • "simplifying" code (this usually falls under the first bullet in disguise).

The standard library and other established crates in the ecosystem contain solutions to the problems you will encounter:

  • Want dynamic typing? Use std::any.
  • Want bit-level manipulation and conversion between primitive types? Use bytemuck.
  • Want safe destructors without leaking and double-free? Congrats, you already have it built into the language, just don't touch unsafe.

You may use unsafe after learning just about everything in this book. That will take a couple months or even years if you are not yet sure about the difference between deallocating memory and calling the destructor of a value.

And if anything, you'll feel the need to use unsafe less, not more, having learnt Rust to more depth. I've been using the language for several (>5) years now and couldn't yet find a single compelling use case for writing my own unsafe outside of FFI. The one exception was wrapping some partially missing FFI bindings for a cryprography library I wanted to use.

2 Likes

Thank you all for answering!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.