What is a good serialization format for client <-> browser communication, taking into account that the client sent data might be maliciously constructed ?
serde/json: I am not sure on this, but I believe attacker can construct heavily nested message to cause stack overflow
serde_json seems to have a recursion limit (of 128), which is probably enough to prevent stack overflows? (Note that this does not mean that I would know whether serde_json is generally "safe against malicious data".)
Here is my (perhaps irrational) fear with binary formats:
suppose we have:
pub struct {
x: Vec<u8>,
b: Foo,
}
can an attacker do something where they claim x has 10 bytes, but then provides 30 bytes of info, where the other 20 bytes, when de serialized as a Foo, causes weird things to happen.
Now, I don't have any evidence of such an attack on bincode (or other binary formats); I just don't know whether the de-serializers are paranoid in this regard.
I don't understand the relevance of x in this example. x parses fine from the initial 10 bytes. And then you have 20 bytes that are being de-serialized as Foo. You might as well skip x and the 10 valid bytes in your example.
When you say "maliciously constructed", what exactly are you trying to prevent?
By using a format like JSON or Protobuf, the payload itself shouldn't be a problem because it's just plain old data with no executable component. Compare this to something like Python's pickle format where the serialized content contains class definitions and bytecode that will eventually be loaded/executed when deserializing (yay, remote code execution ).
It's still possible for the client to generate malicious messages which can trigger legitimate functionality on the server that may have unintended consequences (see the recent log4j/JNDI issue for more), but that's always going to be a possibility unless you validate your inputs.
Another attack vector is denial of service where the client sends data in a way that overloads the server (e.g. saturates bandwidth, triggers OOM because you used read_to_end(), or uses a slow loris attack to tie up all your connections), but that is usually avoided by telling the server to set the maximum payload size and use appropriate timeouts.
This totally depends on the implementation of your deserializer. However, I would assume that it just reads 10 bytes then leaves the other 20 bytes unread.
It might be that you try to read the deserialize the next field and read garbage, but that's no different from if someone gives you a malformed JSON object. This is Rust so there is no risk of reading out bounds, however you can still receive messages that are perfectly valid syntactically but don't make sense in the wider context.
This can be handled using timeouts and limits on payload sizes. For example, Rocket has a Limits config for this.
I would solve this by just defining message types so they aren't recursive. It's easier to prevent these things by construction than adding extra stack depth checks to try and error out when it occurs.
Deserializers shouldn't contain unsafe, so this isn't a problem. If your deserializer needs unsafe code then I think you have bigger problems on your hands and you should do a more in-depth audit.
You can create a buffer with a max size (like an array) and only parse it if it doesn't fill that buffer
If this is possible, there is something horribly wrong with your deserializer (and if it's serde, it probably would've been noticed already). Bounds are always checked in Rust.
I want to clarify one thing in case we have a misunderstanding.
I do not want to write my own de-serializer. I want to pick an existing one to use. I don't know which one were written with paranoia in mind; and which ones were written assuming that it is used only on trusted, cooperating channels.
Yep, I understand that. The responses I gave above should all work regardless of the deserializer and are largely things you have control over.
The note about timeouts/limits is something you configure in whatever code receives these messages, the bit about recursion is done by you structuring your message types so they aren't recursive (which in turn means you won't generate recursive deserializing code), and shell code exploits is a non-issue because Rust.
extra detail, serde will stop deserializing once it finds something invalid, so if you define your type to not be recursive, then it doesn't matter if the message itself is recursive
If you allow some self-advertisement: Neodyn exchange is my binary-and-text format. While implementing it, I was specifically thinking about how corruption or maliciously crafted data could lead to logical inconsistency and crashes, and how to prevent it. I don't claim that it can't be attacked, but I do remember putting various hardening features in its implementation (e.g. buffer sizes and claimed container element counts are always checked for consistency). You may want to give it a go and see what it can and can't withstand.
Unfortunately, that project of mine is not yet ready to publish, so it's not available yet. The serialization component turned out to be a completely independent codebase, so I decided to share it publicly.