De-serialization format with 'defend vs malicious' in mind?

We have a two part Rust program:

  • rust/x86_64 on server
  • rust/wasm32 on client in browser

What is a good serialization format for client <-> browser communication, taking into account that the client sent data might be maliciously constructed ?

  1. serde/json: I am not sure on this, but I believe attacker can construct heavily nested message to cause stack overflow

  2. protobuf: there is Is protobuf safe to handle untrusted data (csharp) ? · Issue #4009 · protocolbuffers/protobuf · GitHub , but I can not find a definitive "protobuf deserialization is safe vs malicious data" guarantee anywhere

Question: Is there any Rust de-serialization that focuses on being safe in the presence of malicious data ?

Bincode allows you to limit the maximum message size for the purpose of preventing denial of service.

serde_json seems to have a recursion limit (of 128), which is probably enough to prevent stack overflows? (Note that this does not mean that I would know whether serde_json is generally "safe against malicious data".)

Here is my (perhaps irrational) fear with binary formats:

suppose we have:

pub struct {
  x: Vec<u8>,
  b: Foo,
}

can an attacker do something where they claim x has 10 bytes, but then provides 30 bytes of info, where the other 20 bytes, when de serialized as a Foo, causes weird things to happen.

Now, I don't have any evidence of such an attack on bincode (or other binary formats); I just don't know whether the de-serializers are paranoid in this regard.

I don't understand the relevance of x in this example. x parses fine from the initial 10 bytes. And then you have 20 bytes that are being de-serialized as Foo. You might as well skip x and the 10 valid bytes in your example.

1 Like

When you say "maliciously constructed", what exactly are you trying to prevent?

By using a format like JSON or Protobuf, the payload itself shouldn't be a problem because it's just plain old data with no executable component. Compare this to something like Python's pickle format where the serialized content contains class definitions and bytecode that will eventually be loaded/executed when deserializing (yay, remote code execution :male_detective:).

It's still possible for the client to generate malicious messages which can trigger legitimate functionality on the server that may have unintended consequences (see the recent log4j/JNDI issue for more), but that's always going to be a possibility unless you validate your inputs.

Another attack vector is denial of service where the client sends data in a way that overloads the server (e.g. saturates bandwidth, triggers OOM because you used read_to_end(), or uses a slow loris attack to tie up all your connections), but that is usually avoided by telling the server to set the maximum payload size and use appropriate timeouts.

This totally depends on the implementation of your deserializer. However, I would assume that it just reads 10 bytes then leaves the other 20 bytes unread.

It might be that you try to read the deserialize the next field and read garbage, but that's no different from if someone gives you a malformed JSON object. This is Rust so there is no risk of reading out bounds, however you can still receive messages that are perfectly valid syntactically but don't make sense in the wider context.

Again, that's why you validate your inputs.

1 Like

Including, but not limited to (as we should assume an infinitely smart attacker):

  1. resource exhaustion: i.e. claiming message is 1 GB, causing wait

  2. stack overflow: exploit the recursive function calls in deserializer, nest stuff in such a way to generate lots of stack frames

  3. shell code: has part of the 'data' be shell code, then exploit a buffer flow + return (in the recursive deserializer) to execute shell code

  4. ... not sure yet ...

I would really like a de-serializer format, even if it is very limited in what it can serialize / de-serialize but has the guarantees of:

  • constant stack space, running time at most O(n) in bytes read, space usage of at most O(n) in bytes read

  • focuses more on being paranoid than being fast

This can be handled using timeouts and limits on payload sizes. For example, Rocket has a Limits config for this.

I would solve this by just defining message types so they aren't recursive. It's easier to prevent these things by construction than adding extra stack depth checks to try and error out when it occurs.

Deserializers shouldn't contain unsafe, so this isn't a problem. If your deserializer needs unsafe code then I think you have bigger problems on your hands and you should do a more in-depth audit.

You can create a buffer with a max size (like an array) and only parse it if it doesn't fill that buffer

If this is possible, there is something horribly wrong with your deserializer (and if it's serde, it probably would've been noticed already). Bounds are always checked in Rust.

edit: whoops I was beaten to this haha

I want to clarify one thing in case we have a misunderstanding.

I do not want to write my own de-serializer. I want to pick an existing one to use. I don't know which one were written with paranoia in mind; and which ones were written assuming that it is used only on trusted, cooperating channels.

Almost everything is written with paranoia1 in mind in Rust

1 healthy paranoia

1 Like

Yep, I understand that. The responses I gave above should all work regardless of the deserializer and are largely things you have control over.

The note about timeouts/limits is something you configure in whatever code receives these messages, the bit about recursion is done by you structuring your message types so they aren't recursive (which in turn means you won't generate recursive deserializing code), and shell code exploits is a non-issue because Rust.

extra detail, serde will stop deserializing once it finds something invalid, so if you define your type to not be recursive, then it doesn't matter if the message itself is recursive

2 Likes

Bincode allows you to configure if it should reject trailing bytes or ignore them. It defaults to ignoring them.

If you allow some self-advertisement: Neodyn exchange is my binary-and-text format. While implementing it, I was specifically thinking about how corruption or maliciously crafted data could lead to logical inconsistency and crashes, and how to prevent it. I don't claim that it can't be attacked, but I do remember putting various hardening features in its implementation (e.g. buffer sizes and claimed container element counts are always checked for consistency). You may want to give it a go and see what it can and can't withstand.

@H2CO3 : Out of curiosity, what is the Neodyn database engine? I can't find it on crates.io.

Unfortunately, that project of mine is not yet ready to publish, so it's not available yet. The serialization component turned out to be a completely independent codebase, so I decided to share it publicly.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.