Saving a complex struct to disk

Something like this: Rust Playground where I have added about the minimum to serialise your data to a JSON string and deserialise it into a new instance of the data. Writing and reading the JSON string from a file is up to you.

See documentation here: GitHub - serde-rs/json: Strongly typed JSON library for Rust

4 Likes

I really recommend using serde and serde_json. Types from other libraries are more likely to implement serialize/deserialize, making life easier in the long run. Here's a code snippet, though @ZiCog beat me to the punch and is likely to have written something nearly identical: Rust Playground

I'm sure json_rust is a great library, I just have little familiarity with it.

5 Likes

The json crate can convert only primitives. For vectors and objects there are macros. Which means, one has to do the nesting manually.

This has advantages and disadvantages. Disadvantage is obvious: more lines of code, each struct needs something like a as_json() method. Advantage is, one can shape JSON while writing it. For example, this one adds a type parameter, which isn't in the Rust struct. Receiving side of the JSON needs this to recognize the type of the structure encoded, Rust recognizes it by the struct used:

use json; // use this in all code snippets here

pub struct TSParameterBool {
  pub default: bool,
  pub current: bool,
}

impl TSParameterValue for TSParameterBool {
  fn as_json(&self) -> json::JsonValue {
    json::object!{
      type: "bool",  // <- not in Rust, but in JSON
      default: self.default,
      current: self.current,
    }
  }
}

One level higher, one encodes data of that level and calls as_json() from the lower level:

pub struct TSParameter<'a> {
  pub name: &'static str,
  pub description: &'static str,
  pub value: &'a dyn TSParameterValue, // <- lower level
}

impl<'a> TSParameter<'a> {
  pub fn as_json(&self) -> json::JsonValue {
    json::object!{
      name: self.name,
      description: self.description,
      value: self.value.as_json(), // <- lower level
    }
  }
}

Same when collecting that into an array/vector:

let mut json_parameters = json::JsonValue::new_array();
for parameter in &parameter_set {
  json_parameters.push(parameter.as_json()).unwrap();
}

This done, make a string from that collected JSON stuff:

let result = json::stringify(json_parameters);


As this is kind of an advertisement for "manual" coding already, let me give an example where data shaping saves a lot of space. Imagine an application which wants to send stock trading candles to the web browser user interface. It's data structure (simplified):

pub struct Candle {
  pub timestamp:  i64,
  pub open:       f32,
}

A generic JSON encoding would give something like this (I hope I don't mistype):

[
  {
    timestamp: 1690192800,
    open: 16200.969583458793
  },
  {
    timestamp: 1690196400,
    open: 16220.969583458793
  },
  // ... repeat 48 bytes 50,000 times.
]

See the redundancy? Parameter names timestamp and open get written over and over again. Also, floats get written with 12 digits after the decimal, where 2 are entirely sufficient for the use case. A better data structure would look like this, less than half the size:

{
  timestamps: [
    1690192800,
    1690196400,
    // ... repeat 12 bytes 50,000 times.
  ],
  opens: [
    16200.96,
    16220.96,
    // ... repeat 9 bytes 50,000 times.
  ]
}

Rust code the get this smaller data structure:

let mut json_timestamps = Vec::with_capacity(history.candles.len());
let mut json_opens = Vec::with_capacity(history.candles.len());
for candle in &history.candles {
  json_timestamps.push(candle.timestamp);
  json_opens.push(round_f32(candle.open, 2));
}
let json = json::object!{
  timestamps: json_timestamps,
  opens: json_opens,
};


What to choose?

  • If you have a complex structure, just want to write it somehow for re-loading later, and data size isn't an issue, generic serialization with serde is certainly a pragmatic choice.
  • If the structure is rather simple, JSON is received by some other application which prefers a certain formatting, and thousands of instances are needed, manual code with json gives quite some opportunities.

I'm just a rust beginner but to me this doesn't seem to be less intimidating than trying to comprehend using serde (which gives type insurance which is why we want to use rust, isn't it?).

3 Likes

Regarding type insurance both approaches don't differ much. Both format from and parse into defined structures. Distinction is, serde defines mapping from JSON to this defined data structure by a data structure. Big advantage if this is the same structure as the one used elsewhere in code, serde takes care of mapping in both directions.

The other approach defines mapping in the form of executing code, which gives opportunities like the ones described above. As formatting and/or parsing usually happens in one place only, this isn't a problem. If both, parsing and formatting are implemented, one has to take care parsing and formatting implementations match.

That said, I use Rust not for type insurance, one can have this in many other languages as well. I use Rust for it's brilliant data ownership model (and being a modern compiled language).

15 posts were split to a new topic: [Serde-JSON] Numeric precision in the JSON format

I may have missed it but serde has a few other huge advantages over hand-rolling a function that haven't been mentioned:

  1. It is composable. If you define type Foo in a library and implement the serde traits, I can use Foo as a field in another type Bar and be able to serialize and deserialize it. And I can choose json, toml, bincode, etc.
  2. It doesn't require a custom type. You can just ask serde_json or another format to deserialize, say, a Vec<f64> or a HashMap<String, Vec<i32>> from the format of your choice.
  3. It is standard. You can use it with libraries like figment and numerous Rust libraries support it under a feature flag.
5 Likes

Strings are strings either way so I can follow that, but numbers are much smaller in binary than in text. For every byte you add to a number, you must add two characters to a hex string, eight characters to a binary string, and 2.4 characters on average for a decimal string (log 256 to be precise).

Now consider that in Rust, each character is 4 bytes. That means a byte in binary is 8 bytes as a hex string, 64 bytes as a binary string, and about 9.6 bytes as a decimal string. To break even, a byte for a byte, a single character must encode 4 bytes of information, which is to say, a base 4,294,967,296 string must be used. At which point, you may as well use a binary encoding, open the result as a text file, and observe how UTF-8 attempts to make sense of it.

The point being, binary encoding is objectively smaller than vanilla text encoding for numbers.

However, your comparison is between binary encoding and reducing the length of frequent strings in the text encoding. Have you tried employing both methods at once? Binary encoding would reduce the length of the numbers, in addition to manually reducing string sizes as in your text encoding method.

False. String is backed by a Vec<u8>, not by a Vec<char>. You are confusing UTF-8 with UTF-32.

The biggest advantage of serde is that it doesn't tie you to one specific serialization format.

3 Likes

It does tie you to an abstract data model though. IIRC it has trouble with large arrays of fixed size.

Well, just so you know, your suggestions worked and I am now using serde to encode my struct into JSON and can save and read from disk with ease. Thanks for your help.

9 Likes

Ah, good to know. In that case, each character in a String is one byte. I think my overall point still holds; that binary encoding is always smaller for numbers.

1 Like

A "character" can be more than one byte.

Hmm... That is only true if the text is in language that can be represented in good old fashioned ASCII. For other languages each character maybe more than one byte. That is the whole point of Unicode after all.

Not "always".

If my software requires using 64 bit integers then a number is 8 bytes. If it happens that mostly my numbers are small then the string representation is saving me space most of the time.

As usual in engineering one has to make tradeoffs. For example: Is it really worth going to the trouble of going to a binary format to save 1% execution time on a program that is not used very much? Arguably, no, one will burn more energy and money doing so.

2 Likes

Note: @notriddle moved my whole post, but only the first section was about the numeric precision being unspecified. I've moved back the rest since I believe they are relevant here.

To elaborate on this, UTF-8 is a variable-width encoding. Although it may ultimately be an array of bytes, it's not an uniform array at that. Each Unicode Scalar Value can be anywhere between 1 and 4 bytes long.

assert_eq!("1".len(), 1);
assert_eq!("あ".len(), 3);

Furthermore, one USV[1] doesn't always correspond to one character (making char a blatant misnomer). There is a notion of grapheme clusters, which is any USVs that are visually joined. The Unicode standard doesn't formally define any correspondence between USVs and characters[2]; grapheme clusters are merely an approximation of it.

For most practical purposes, however, you could pretend that one grapheme cluster correspond to one character, as they are the closest thing to characters we have. Perhaps the Unicode Consortium should have made up their mind and defined it that way. It would be wrong in some technical senses, but wouldn't have brought as much mass confusion as it did by not doing so. After all, it can hardly be called a character encoding if it weren't encoding characters.

Just like how Unicode has made a switch from fixed-width to variable-width, there is nothing stopping any binary encoding from having variable-precision numbers. Maybe the particular format you use might not have one, but there are others that do. If all else fails, you could always store a variable-width byte string, and interpret them as numbers.


  1. Unicode Scalar Value ↩︎

  2. What, the standard does not formally define something again? That's right; it turns out standardization bodies do much less than some people think. ↩︎

4 Likes

I don't understand. The json crate can only make untyped loosie goosie objects, just like JavaScript. It cannot (attempt to) deserialize from json into a well-typed struct or go in the other direction either. It's not solving even 1% of the problem that serde solves and doesn't offer any functionality that serde_json::Value (a much more widely used and vetted crate) doesn't have. It is the antithesis of rust, and I would suggest that making do with the json crate suggests rust is not the right language for whatever you're doing (because why not just python or JS itself?).

Furthermore, an example of serde derive is literally on the serde homepage https://serde.rs/ . I don't think the docs are to fault here. Which is to say nothing about other crates’ docs — but we want our conclusions to be drawn from good data, and “serde’s docs are bad” is demonstrably false.

4 Likes

Core documentation is good

But many crates' docs are woeful

The lack of examples is a particular pain point.

I recently had a very simple encryption problem, I reviewed over a dozen crates via github and not one had an examples folder.

Not one

PS "simple_crypt"

This thread's beyond saving, but if you're still not sure about what the attributes and other things in Serde do, feel free to make a new post asking about what each part does.

Serde is hard to understand because the actual library does basically nothing at runtime. All the actual work is done in the code generated by the macros, and in serde_json (or whatever format library you're using). Serde is the (very important) glue between them.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.