So my question is how can I ensure posting weird data will not break my program? Ideally in cases like this description should be casted to string or vice versa - try to parse string to number or else throw a validation error.
I'm using validation. The problem is the error in this case is thrown BEFORE validation is called. Here is the piece of code where I use the struct to be validated:
let body: Metadata = req.json().await?; // Error is thrown here
match body.validate() { // Validation is being done here
Ok(_) => {},
Err(e) => return Response::error(json!(e).to_string(), 400)
}
Why would it? You clearly caught somebody submitting nonsense to your API, because an error is thrown when the description field has the wrong type. Which is far safer than what you want to achieve, namely trying to convert wrongly typed data to the right type.
Validation and deserialization are two distinct steps in your case. If deserialization itself fails, i.e. because the data is the wrong type, validation will never be executed. With validation you can catch stuff you can't with deserialization alone, like making sure a string is actually a valid URL. If you don't like this approach (which I totally understand), move validation into deserialization. Validator for example offers its validations as functions, like validate_required. You can call that inside your implementation of Deserialize, making deserialization fail directly, rather than having a second validation step. Downside is that you can't use the default Deserialize implementation provided by the macro.
I personally would outright reject foolishness like wrongly typed data instead of trying to make sense of it. Passing a number in a field that requires a string is nothing I'd want my API to encourage. serde_json does exactly that, rejecting wrong types. If you want to circumvent this, you could override the default deserialization behaviour of a field with a custom function you pass to serde with the #[serde(deserialize_with = "...")] attribute:
#[derive(Serialize, Deserialize, Debug, Validate)]
pub struct Metadata {
#[validate(required(message = "`name` is required"))]
name: Option<String>,
#[validate(required(message = "`description is required`"))]
#[serde(deserialize_with = "string_or_number")]
description: Option<String>,
#[validate(
required(message = "`image` is required"),
url(message = "`image` must be valid URL"),
contains(pattern = "ipfs://ipfs/", message = "`image` must be an IPFS URL")
)]
image: Option<String>,
}
fn string_or_number<'de, D>(deserializer: D) -> Result<String, D::Error>
where
D: serde::de::Deserializer<'de>,
{
// ...
}
The reason why I think it makes sense to convert the value in this case is because of the field simply being a description and is not really important to the business logic, it makes sense to be forgiving and not throw an error, but simply save the value as a string.
As for the opposite case, one could allow for example accepting a hex string as a number. In this case it would make sense to at least try parsing it to a numerical type and only when that fails throw an error.
It would also make sense to accept numeric strings in case of really large numbers which are larger than floating points MAX_SAFE_INT, the JS in JSON stands for JavaScript afterall Granted I don't need this at this point, but was merely curious how one would achieve this.
Yes I noticed this is the main cause for the issue here. Is it possible to have both deserialization and validation in the same step? How can I do this? Do I have to write my own deserialization functions for all my variable types or is there a more straightforward approach?
I see. Note that there's the serde_json::Value enum that can represent any valid JSON value. You might want to use that in your struct and only convert the received value once you need it to be in a certain state.
If you need support for complex parsing/validation like you described (i.e. hex strings to numbers, special urls, etc. [1]), I'm afraid so. At least I don't know a crate that combines validation with deserialization whilst being comprehensive and flexible enough for many use-cases. I started implementing one (which uses validator under the hood in some cases), but got distracted.
Except required. Just making your fields have type T instead of Option<T> disables the possibility of omitting the field or passing null. ↩︎
That's not necessary. JSON as a serialization format doesn't impose any limit on the precision or size of a number. If you want to handle arbitrary-precision ints and floats, then the right solution is for the JSON library to parse JSON numbers into the appropriate, lossless representation in the language (as is the case with serde-json), and not to accept strings where a number was expected.
That was exactly what others suggested you to do above.
Being forgiving with bad input causes nothing but pain: it results in ambiguity, pointelss debates on what is and isn't allowed and which alternative one should produce, and ultimately results in bugs and ecosystem splits (and thus, even more bugs).
Just force the sender to send correct data. It's not rocket science.
If you want to be forgiven, but not as forgiven as to go with the extreme of using serde_json::Value, you could create a union of String and fsize, for example:
Suffice to say that, as other have pointed out, this is ill-advised. Hopefully you'll understand why everyone is suggesting against being flexible in the deserialization before it bites you back.
While that is true JSON RFC specifically defines numbers as IEEE-754. Sure it might work but if you want to make your api compatible with other languages then you should expect the numbers to be serialized to doubles, which they are in JS. To ensure precision for huge numbers in JSON the safest bet is to serialize them to strings and ensure they are handled properly on each side by whatever tooling the language supports (u256 in Rust, BigInt in JS).
I understand the reasoning behind not allowing flexibility and I agree for this to be the case when a variable has a specific purpose in the code. However the question was about a description variable, which can safely parsed to string for users convenience