Recursive data structure

I am trying to implement an universal data type structure that will allow to use recursive data (array of array of whatever etc). and I have a testbench that crashes in some unpredictable moments.

here are three structures that I've created

Simdata is one data element, like variable that also stores currently used datatype
Simdatapayload is value itself without storing datatype
and finally SimDataType is type




#[derive(Clone, Copy, Debug)]
enum SimDataType {
    Float,
    Vector,
    Dict,
    Bool,
    Error,
    Null
}

// #[derive(Clone)]
union SimDataPayload{
    fValue: f32,
    bValue: bool,
    vValue: ManuallyDrop<Box<Vec<SimData>>>
}


impl Clone for SimDataPayload {
    fn clone(&self) -> Self {
        unsafe {
            match self {
                Self { fValue } => Self { fValue: *fValue },
                Self { bValue } => Self { bValue: *bValue },
                Self { vValue } => Self {
                    vValue: ManuallyDrop::new((**vValue).clone()),
                },
            }
        }
    }
}


impl fmt::Debug for SimDataPayload {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        unsafe {
            match self {
                SimDataPayload { fValue } => write!(f, "fValue: {}", fValue),
                SimDataPayload { bValue } => write!(f, "bValue: {}", bValue),
            }
        }
    }
}


#[derive(Clone, Debug)]
struct SimData{
    dataType: SimDataType,
    data: SimDataPayload,
}

impl Drop for SimData {
    fn drop(&mut self) {
        if let SimDataType::Vector = self.dataType {
            unsafe {
                ManuallyDrop::drop(&mut self.data.vValue);
            }
        }
    }
}

impl fmt::Display for SimData {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self.dataType {
            SimDataType::Float => write!(f, "{}", unsafe { self.data.fValue }),
            SimDataType::Bool => write!(f, "{}", unsafe { self.data.bValue }),
            SimDataType::Null => write!(f, "Null",),
            _ => write!(f, "unsupported type"),
        }
    }
}

impl SimData{

    // initialization

    fn createFloat(v:f32) -> SimData{
        SimData{
            dataType: SimDataType::Float,
            data:SimDataPayload{
                fValue:v
            }
        }
    }

    fn createBool(v:bool) -> SimData{
        SimData{
            dataType: SimDataType::Bool,
            data:SimDataPayload{
                bValue:v
            }
        }
    }

    fn createNull() -> SimData {
        SimData { 
            dataType: SimDataType::Null, 
            data:SimDataPayload{
                fValue:0.0
            }
        }
    }

    fn createVector(v: Vec<SimData>) -> SimData {
        SimData {
            dataType: SimDataType::Vector,
            data: SimDataPayload {
                vValue: ManuallyDrop::new(Box::new(v)),
            },
        }
    }

    fn dataTypeName(self) -> String {
        match self.dataType {
            SimDataType::Float => "Float".to_string(),
            SimDataType::Vector => "Vector".to_string(),
            SimDataType::Dict => "Dict".to_string(),
            SimDataType::Bool => "Bool".to_string(),
            SimDataType::Error => "Error".to_string(),
            SimDataType::Null => "Null".to_string()
        }
    }

    // math

    fn sum(v1:SimData, v2:SimData) -> SimData{
        if(v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float"){
            unsafe{
                let b1 =  v1.data.fValue;
                let b2 =  v2.data.fValue;
                return SimData::createFloat(v1.data.fValue+v2.data.fValue)
            }
        }
        println!("cannot add different data types");
        process::exit(1);
        SimData{
            dataType:SimDataType::Error,
            data:SimDataPayload{
                fValue:0.0
            }
        }
    }

    fn sub(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe{
                return SimData::createFloat(v1.data.fValue - v2.data.fValue);
            }
        }
        println!("Cannot subtract different data types");
        process::exit(1);
    }

    fn mul(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe{
                return SimData::createFloat(v1.data.fValue * v2.data.fValue);
            }
        }
        println!("Cannot multiply different data types");
        process::exit(1);
    }

    fn div(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe{
                return SimData::createFloat(v1.data.fValue / v2.data.fValue);
            }
        }
        println!("Cannot divide different data types");
        process::exit(1);
    }

    // logical operators

    fn and(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Bool" && v2.clone().dataTypeName() == "Bool" {
            unsafe{
                return SimData::createBool(v1.data.bValue && v2.data.bValue);
            }
        }
        println!("Cannot perform 'and' on different data types");
        process::exit(1);
    }

    fn or(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Bool" && v2.clone().dataTypeName() == "Bool" {
            unsafe{
                return SimData::createBool(v1.data.bValue || v2.data.bValue);
            }
        }
        println!("Cannot perform 'or' on different data types");
        process::exit(1);
    }

    fn nor(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Bool" && v2.clone().dataTypeName() == "Bool" {
            unsafe {
                return SimData::createBool(!(v1.data.bValue || v2.data.bValue));
            }
        }
        println!("Cannot perform 'nor' on different data types");
        process::exit(1);
    }
    
    fn xor(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Bool" && v2.clone().dataTypeName() == "Bool" {
            unsafe {
                return SimData::createBool(v1.data.bValue ^ v2.data.bValue);
            }
        }
        println!("Cannot perform 'xor' on different data types");
        process::exit(1);
    }
    
    fn not(v: SimData) -> SimData {
        if v.clone().dataTypeName() == "Bool" {
            unsafe {
                return SimData::createBool(!v.data.bValue);
            }
        }
        println!("Cannot perform 'not' on non-boolean data type");
        process::exit(1);
    }

    // comparisons

    fn gt(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue > v2.data.fValue);
            }
        }
        println!("Cannot perform 'gt' on different or non-float data types");
        process::exit(1);
    }
    
    fn gte(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue >= v2.data.fValue);
            }
        }
        println!("Cannot perform 'gte' on different or non-float data types");
        process::exit(1);
    }
    
    fn lt(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue < v2.data.fValue);
            }
        }
        println!("Cannot perform 'lt' on different or non-float data types");
        process::exit(1);
    }
    
    fn lte(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue <= v2.data.fValue);
            }
        }
        println!("Cannot perform 'lte' on different or non-float data types");
        process::exit(1);
    }
    
    fn eq(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue == v2.data.fValue);
            }
        }
        println!("Cannot perform 'eq' on different or non-float data types");
        process::exit(1);
    }
    
    fn neq(v1: SimData, v2: SimData) -> SimData {
        if v1.clone().dataTypeName() == "Float" && v2.clone().dataTypeName() == "Float" {
            unsafe {
                return SimData::createBool(v1.data.fValue != v2.data.fValue);
            }
        }
        println!("Cannot perform 'neq' on different or non-float data types");
        process::exit(1);
    }

    // convertors

    fn readFloat(self) -> f32 {
        unsafe{ return self.data.fValue }
    }

    fn readBool(self) -> bool {
        unsafe{ return self.data.bValue }
    }

    fn readVector(&self) -> &Vec<SimData> {
        match self.dataType {
            SimDataType::Vector => unsafe { &*self.data.vValue },
            _ => {
                println!("Cannot read non-vector as a vector");
                process::exit(1);
            }
        }
    }
}


#[derive(Clone)]
struct ContextScope {
    context: Rc<RefCell<HashMap<String, SimData>>>,
    parent_scope: Option<Rc<ContextScope>>,
}

impl ContextScope {
    fn new() -> ContextScope {
        ContextScope {
            context: Rc::new(RefCell::new(HashMap::new())),
            parent_scope: None,
        }
    }

    fn extend(&self) -> ContextScope {
        ContextScope {
            context: Rc::new(RefCell::new(HashMap::new())),
            parent_scope: Some(Rc::new(self.clone())),
        }
    }

    fn set(&self, name: String, value: SimData) {
        let mut current_scope = self;
        self.context.borrow_mut().insert(name, value);
    }

    fn pset(&self, name: String, value: SimData){

        if let Some(parent_scope) = &self.parent_scope {
            if(parent_scope.has(name.clone())){
                parent_scope.pset(name.clone(), value)
            }
            else {
                self.set(name, value)
            } 
        } else {
            self.set(name, value)
        }
    }


    fn has(&self, name: String) -> bool {
        
        let context_ref = self.context.borrow();
        if let Some(value) = context_ref.get(&name) {
            true
        }else{false}
    }

    fn get(&self, name: String) -> SimData {
        if let Some(value) = self.context.borrow().get(&name) {
            return value.clone();
        }

        if let Some(parent_scope) = &self.parent_scope {
            return parent_scope.get(name);
        }

        SimData::createNull()
    }
}

here is my test

 let mut parent = ContextScope::new();
    let mut child = parent.extend();

    println!("\nVectors:");
    
    child.pset(
        String::from("v"),
        SimData::createVector(
            vec![
                SimData::createFloat(6.0),
                SimData::createFloat(61.0),
                SimData::createVector(
                    vec![
                        SimData::createFloat(3.0),
                        SimData::createFloat(8.0),
                    ]
                ),
                SimData::createFloat(62.0),
            ]
        )
    );

    let vec_value = child.context.borrow().get("v").cloned();

    if let Some(v) = vec_value{
      
        let mut vInner = &v.readVector().to_vec()[2];
        let mut vInner0 = &vInner.readVector().to_vec()[0];
        println!("v [2][0]: {} (???)", vInner0);   
        let mut vInner1 = &vInner.readVector().to_vec()[1];
        println!("v [2][1]: {} (???)", vInner1);   
        println!("v [3]: {} (62)", &v.readVector().to_vec()[3].clone().readFloat());    
        println!("v [1]: {} (61)", &v.readVector().to_vec()[1].clone().readFloat());    
        println!("v [0]: {} (6)", &v.readVector().to_vec()[0].clone().readFloat());   

    }
    
    

    println!("\nAll tests passed!");

here is output

Vectors:
v [2][0]: 3 (???)
v [2][1]: 8 (???)
v [3]: 62 (62)
v [1]: 61 (61)
interpret(41589,0x1fd252500) malloc: *** error for object 0x8: pointer being freed was not allocated
interpret(41589,0x1fd252500) malloc: *** set a breakpoint in malloc_error_break to debug
[1] 41589 abort cargo run

I noticed that it works for not-nested structures. What can I try to do with this issue? Should I try switching to rew pointers?

This doesn't seem like it should need any unsafe. Why don't you just use an enum? Enums know when/how to drop their own associated data correctly.

1 Like

it does not compile without unsafe for some reason, I am a newbie here, if you can suggest me better way to do this I would appreciate. Can you please help me to understand what causes crash, because from my understanding I create vector, then another one pointing to first one.

like this (pseudo-code):
vec[6,61,vec[3,8],62]

if I read 3 and 8 and then try to read 62 it will crash, why?
Tho if I read just 62 it will work

Nah, that's not what I mean. I didn't meant that this particular code snippet shouldn't require unsafe. The concrete code you have written obviously does require unsafe because you are using unions.

What I really meant is that you shouldn't be using a union in the first place. You are probably coming from C, but in Rust, you should represent a dynamic choice among a small, statically-known set of types using enums. The data type you want is probably something like the following:

#[derive(Clone, Debug)]
enum SimData {
    Null,
    Bool(bool),
    Float(f64),
    Vector(Vec<SimData>),
    Dict(HashMap<String, SimData>),
    Error(String),
}

Using this data type doesn't require any unsafe. You don't need to write any memory manual memory management. An instance of an enum always correctly knows which particular variant it contains, and dropping it runs the correct destructors automatically when needed. You should always prefer this over a raw, unsafe union, especially while you are a beginner.

The use cases for raw unions are usually advanced (related to low-level memory management, FFI, etc.), and unions are not needed even in more complex, high-level programs. Enums work just fine, and it's in fact how most dynamic types are represented. For example, look at the most used JSON implementation's Value: it's an enum.


As an aside, you shouldn't spell functions in camelCase. Please follow Rust's naming conventions. Run the official linter, Clippy, and re-write your code so as to remove any warnings/errors.

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.