Creating a FFI json_decoder for dynamic Json

I am currently at the Start of Chapter 13 of the Beginner Book and between the chapters i experiment with the language and already create small little dll's to use them in a high level language hobby project to offload process heavy code. I am proud of myself that i managed to get this working for some parts of my code already and increased performance of high use functions in my project dramatically. But in this case i try to create a json decoder, because those that are available in the high level are very slow compared to serde_json in rust.

The json i currently try to decode within rust and then move it in a decoded form back to the caller, is dynamic and its contents arent predictable. In the high level language i need to iterate over its contents. Let me show

My json can look like this. It contains String, bool and number values

{"maxrpm":4700,"absMode":"realistic","minGearIndex":-1,"checkengine":false,"engineRunning":1,"freezeState":false,"gearIndex":0,"parking":0,"gearboxMode":"realistic","lowpressure":0,"oil":0,"lowhighbeam":0,"nop":0,"highbeam":0,"lowhighbeam_signal_R":0,"lowhighbeam_signal_L":0,"horn":0,"reverse":0,"lowfuel":false,"highbeam_wigwag_R":0,"highbeam_wigwag_L":0,"reverse_wigwag_R":0,"lights_state":0,"boost":0,"boostMax":0,"ignition":true,"brakelight_signal_L":0,"smoothShiftLogicAV":0,"hazard_enabled":0,"gear":0,"fog":0,"maxGearIndex":4,"reverse_wigwag_L":0,"lightbar":0,"odometer":0,"lowbeam":0,"running":true,"signal_left_input":0,"signal_right_input":0,"fuelVolume":1,"brakelight_signal_R":0,"idlerpm":700}

Sometimes it can also be just this

{"odometer":0.022705062608801,"smoothShiftLogicAV":0.017504340433356}

Or as little as this

{"parking":1}

The fields arent defineable, they can be anything, just as their values. And because of it, with my current knowledge i cant pre define a struct where serde_json will put the data into.

with

let json: Value = serde_json::from_str(&json)?;

i can get, what seems like, a hashmap. The question for me is tho can i bring this hasmap either in its form directly or in another over the ffi border back to the caller. Is this possible with serde_json or within rust at all?

I had sucess bringing predefined structs over the ffi border that where filled with json's that always have the same contents, but predefining structs for a dynamic json seems to be not possible or maybe that is possible?

Im sitting here looking like this
20230131_130105
trying to think of a dynamic data structure that is associative and can hold variable data. Structs are pre defined - not applicable. Vectors and hashmaps can only hold data of the same type - not applicable. Maybe a dictionary object is of help here. But then there is the problem of also moving whatever data structure across the ffi border back to the caller.

I dont see a way todo this

Obligatory pls halp
I hope you guys understand that my ability to research this is very limited because of how new i am to low level programming and FFI overall :slight_smile:

It's certainly possible to create a C API for a generic JSON data structure, but it's going to be annoying to write and use. The JSON-C API docs should give you an good idea of how you might accomplish that.

Some questions to consider:

  1. What does your high level code actually do with the JSON data? Unless you're building something REALLY generic like a JSON viewer, various parts of your code probably do expect the objects extracted from the JSON to have a specific shape. In this case you could consider just writing an FFI function for each type of data you need to decode so you don't have write a general purpose JSON API.
  2. Does the performance difference between the high level languages native JSON decoding and serde_json actually matter? If it's slow but not slow enough that it's causing problems it may just not be worth the effort to wire this API up.

Your high level language may also expose FFI APIs for creating objects from Rust, which would allow you to build up a data structure like what the high level language's JSON decoder would. This can be fairly tricky depending on the details of how the languages interact though.


Here's a rough outline of how you might accomplish writing a C API for serde_json's `Value` enum
#![allow(clippy::missing_safety_doc)]
use std::ffi::{c_char, CStr};

use serde_json::Value;

pub unsafe extern "C" fn json_decode(bytes: *const u8, len: usize) -> *const Value {
    match serde_json::from_slice(unsafe { std::slice::from_raw_parts(bytes, len) }) {
        Ok(value) => Box::into_raw(Box::new(value)),
        Err(err) => {
            eprintln!("Error decoding JSON: {}", err);
            std::ptr::null()
        }
    }
}

pub unsafe extern "C" fn json_destroy(value: *const Value) {
    drop(Box::from_raw(value as *mut Value))
}

pub unsafe extern "C" fn json_get_longlong(
    value: *const Value,
    output: *mut std::ffi::c_longlong,
) -> bool {
    match unsafe { &*value } {
        Value::Number(number) => match number.as_i64() {
            Some(num) => {
                // should probably handle the case where this i64 and c_longlong aren't the same type
                unsafe {
                    // Don't read the old value
                    output.write(num)
                };
                true
            }
            None => false,
        },
        _ => false,
    }
}

pub unsafe extern "C" fn json_get_key(value: *const Value, key: *const c_char) -> *const Value {
    let key = match unsafe { CStr::from_ptr(key) }.to_str() {
        Ok(string) => string,
        Err(_) => return std::ptr::null(),
    };
    match unsafe { &*value } {
        Value::Object(obj) => obj
            .get(key)
            .map(|v| v as *const _)
            .unwrap_or(std::ptr::null()),
        _ => std::ptr::null(),
    }
}

#[cfg(test)]
mod test {
    use std::mem::MaybeUninit;

    use crate::{json_decode, json_destroy, json_get_key, json_get_longlong};

    #[test]
    fn simple() {
        let data = b"{\"one\": 10000}";
        unsafe {
            let value = json_decode(data.as_ptr(), data.len());
            assert_ne!(value, std::ptr::null());

            let inner = json_get_key(value, "one\0".as_ptr().cast());

            let mut int = MaybeUninit::uninit();
            assert!(json_get_longlong(inner, int.as_mut_ptr()));
            assert_eq!(int.assume_init(), 10000);
            json_destroy(value);
        }
    }
}

The tests pass miri, but it's very possible there's still some undefined behavior hiding in there somewhere. Note in particular that this API is operating on the assumption that nothing mutates any of the Value's members

1 Like

First of all thanks big times for your explanatory response, it was of much help ! :slight_smile:

The high level language that the project is written in is called Autoit3. The lang doesnt provide support for multi threading but my project requires somewhat time critical operations, because there is a high input of information. Certain larger Json's (~2 to 8kb) i have to deal there with can easiely take up to 13 ms of processing time, time that is lost in a blocking operation. Thats why i want to use my new rust knowledge to optimize as much as possible. That way i make room for other time consuming things.

That is the first reason why i want todo this. The second is that this is the kind of way how i learn a language. Todo something i cant come up with a solution yet.

Anyway, your code is something that i definitly didnt think of. When i understand it correctly, the json is decoded within the dll, kept in memory, but the pointer to the memory block is returned to the caller. This pointer can then be used to fetch data from the Value Enum by calling the dll again. While in the end the allocated memory also has to be free'd again within rust.

I worked around with this method since yesterday but as you said yourself, trying to recreate a data structure from a high level lang, is tricky. And i only had minor success with it.

So ive come up with something else. And that is to convert the json within rust to a different format. One that the high level can much faster decode.

While this is likely the absolut beginner way of doing it, it works to a certain degree

#![allow(clippy::missing_safety_doc)]
use std::ffi::{c_char, CStr, CString};
use std::str;

use serde_json::Value;

#[no_mangle]
pub extern fn json_decode(bytes: *const c_char, len: usize) -> *mut c_char {
	let json: Value = match serde_json::from_str(&c_char_to_string(bytes)) {
		Ok(value) => value,
		Err(err) => {
			eprintln!("Error decoding JSON: {}", err);
			return string_to_c_char(String::new());
		}
	};
	
	//dbg!("{:?}", &json);
	
	let mut data = String::new();
	for (key, value) in json.as_object().unwrap() {
		//println!("{}:{}", key, value);
		
		let var = format!("{:?}", value);
		//println!("{}", var);
		
		if &var[..6] == "String" {
			let cut_value = &format!("{}", value);
			let cut_value = &cut_value[1..cut_value.len() - 1];
			data.push_str(&format!("{}:s:{}\n", key, cut_value));
			continue;
			
		} else if &var[..6] == "Number" {
			data.push_str(&format!("{}:n:{}\n", key, value));
			continue;
			
		} else if &var[..4] == "Bool" {
			data.push_str(&format!("{}:b:{}\n", key, value));
			continue;
			
		} else if &var[..5] == "Array" {
			let cut_value = &format!("{}", value);
			let cut_value = &cut_value[1..cut_value.len() - 1];
			data.push_str(&format!("{}:a:{}\n", key, cut_value));
		}
	}
	
	string_to_c_char(data)
}



// all variables where the ownership has been passed over he FFI border
// need to be given back to rust and deallocated by rust <<<<=============
#[no_mangle]
pub extern fn c_string_free(string: *mut c_char) -> u8 {
	unsafe {
		if string.is_null() {
			return 0;
		}
		CString::from_raw(string)
	};
	
	1
}

fn string_to_c_char(string: String) -> *mut c_char {
	CString::new(string).expect("CString::new failed!").into_raw()
}

// autoit compatible function
// auto decodes autoit binary
fn c_char_to_string(c_string: *const c_char) -> String {
	let c_str = unsafe {
		assert!(!c_string.is_null());
		CStr::from_ptr(c_string)
	};
	
	let r_str = c_str.to_str().expect("Could not successfully convert string form foreign code!");
	
	let mut string = String::from(r_str);

	// if the decoded string is autoit binary
	if &string[0..2] == "0x" {
		string = hex_to_string(string);
	};
	
	string
}

// autoit compatible function
// decoded special characters might look a bit weird when displayed
fn hex_to_string(hex: String) -> String {
	let hex = hex::decode(hex[2..].to_string()).unwrap();
	//dbg!("{}", &hex);
	let hex = match str::from_utf8(&hex) {
		Ok(v) => v,
		Err(e) => panic!("Invalid: {}", e),
	};
	
	hex.to_string()
}

This returns a string that looks like this and that is quicker to decode within Autoit3
key:type:value\n
eg.

a_string:s:hello world
a_number:n:3223.32232323
a_bool:b:false
a_array:a:100,-500,2

However this method is not refined and has issues with jsons that contain further objects like

{"item":{"anotheritem":1}}

And with time i will likely fix this if i dont come up with a much different solution. But as of right know i am happy that this method works and that it in fact does perform so much better then the native solution

Decoding the 725 bytes long Json from far above 1000 times:

DLL took: 248.9007 ms
Native took: 2795.8536 ms

Edit: whoops used the debug build

Thanks again :slight_smile:

Debug formatting isn't considered stable (and is also very slow compared to matching on the enum)

Consider doing something like this instead

for (key, value) in json.as_object().unwrap() {
    match value {
        Value::Null => todo!(),
        Value::Bool(value) => data.push_str(&format!("{}:b:{}\n", key, value)),
        Value::Number(value) => data.push_str(&format!("{}:n:{}\n", key, value)),
        Value::String(value) => data.push_str(&format!("{}:s:{}\n", key, value)),
        Value::Array(_) => data.push_str(&format!("{}:a:{}\n", key, value)),
        Value::Object(value) => todo!(),
    }
}

Also note that you're basically embedding JSON if your objects contain other objects. If you don't have deeply nested objects that may not matter to you at the moment though.

Parsing the arrays unambiguously might be tricky if the value in the array could be a string containing a comma too. Though it's possible for your use case you can guarantee that isn't ever the case

1 Like

The reason why i use format!() here is because i did not find a way to extract the data from the Value enum as a String, otherwise i would likely just add strings together.

To the array maybe containing strings with ',' could actually be a problem, i have to make me a note about this.

And this variant with match looks much cleaner, thanks for that too :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.