Behavior-altering features considered harmful?

Features are typically used to manage dependencies, or to incrementally enable API. Sometimes, however, feature flags are used to modify behavior of APIs otherwise enabled.

Here's why it can be problematic, on the example of time and its feature serde-human-readable which enables human-readable parsing and formatting in the serde impls in case when the Serializer/Deserializer advertises the format as human-readable.

Consider library a crate with this Cargo.toml:

[package]
name = "a"
version = "0.1.0"
edition = "2021"

[dependencies]
time = { version = "0.3", features = ["serde"] }
serde_json = "1.0"

It relies on omission of the serde-human-readable feature to format the output string as a JSON array:

use time::OffsetDateTime;

pub fn json_clock() -> String {
    let t = OffsetDateTime::now_utc();
    serde_json::to_string(&t).unwrap()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn formats_as_array() {
        let json = json_clock();
        let v: serde_json::Value = serde_json::from_str(&json).unwrap();
        assert!(v.is_array());
    }
}

Taken on its own, the crate passes the test. Now add it to a dependency tree of another crate that additionally enables serde-human-readable in time:

[package]
name = "b"
version = "0.1.0"
edition = "2021"

[dependencies]
a = { path = "../a" }
time = { version = "0.3", features = ["serde-human-readable"] }

The binary of this crate prints a JSON string literal:

use a::json_clock;

fn main() {
    println!("{}", json_clock());
}

If these crates are put together in a workspace, the test in a also fails.

This example is a bit contrived, but there are likely other instances where behavior selected by feature omission is liable to be stealthily modified outside of the crate relying on this behavior.

Should such features be considered a footgun? Does it deserve a mention in the API guidelines?

11 Likes

It is generally considered good practice to not have features change behavior.

Cargo doc about this more or less has this as consequence: Features - The Cargo Book “features should be additive”.

Maybe what it doesn't say, but is also a consequence, is that the feature design should be such, that if you depend on and use a crate Foo, then any other crate in the same project should be able to enable features in Foo without that ruining your original use of the crate Foo. It should continue to build and it should continue to work in a compatible way.

8 Likes

In this example, they could instead provide a time::serde_human_readable module with the alternative implementation, like we did with indexmap::serde_seq.

2 Likes

I don’t think that the time crate guarantees anything about the precise way its types are serialized. So it might not even be considered a breaking change, if e.g. the crate was changed so that serde-human-readable is always enabled by default; and a test-case like this might be invalid because it tests unstable implementation details.

At least in general, serialization accesses private fields... so precise serialization behavior of types is quite commonly unspecified and allowed to change over “non-breaking” changes like adding a new private field, or otherwise changing the internals of a serializable type.

2 Likes

Serialization is an ABI of sorts -- I would expect stronger guarantees than that. If someone is serializing over the network or to disk, there's no guarantee that it will be the exact same version deserializing, nor even the same target arch/os/etc. Implementations should try to be backwards compatible (new version reading old data), but ideally the new version should also serialize in a way that past versions can understand. This may mean that the derived implementation is insufficient.

5 Likes

Yes. I would say it's an outright bug.

Unfortunately, this is less convenient than enabling a human-readable format in the default derive (as hardly anyone would like to read a time value from a broken-down tuple). The decision not to have it always on is motivated by code size and dependency management: the parsing and formatting features it depends on bring in some sizable code and a dependency on std (which could be made unnecessary with some rework, but that's another issue).

That sounds more like a Debug concern to me, where I think it would be fair game to make changes. Serialization is not generally for humans.

Why, we've been using serde_json and serde_yaml with configuration structures and other similar data to a great effect. This is why paying regard to is_human_readable is very useful in serde impls. Newtypes and helper mods are helpful, but less convenient.

I am the maintainer of the time crate for those not aware.

This is correct. The only guarantee is that the serialized format can be deserialized by a compatible version of time (0.3 in this case). The actual format is not specified unless you're using #[serde(with = …)], in which case the stability guarantee is placed on the user.

The time crate can deserialize a binary format even when serde-human-readable is enabled. This was an intentional decision to avoid breakage. It will also serialize to a binary format if the serializer is not declared human readable.

But can it deserialize the human format when that feature is not enabled? If not, that's trouble.

If you are able to make that work, then most of the objection goes away. That is, if serialization depends on the feature while deserialization always handles both, then things should just work.

2 Likes

No, it cannot as the underlying features required are not the same. Both serialization and deserialization depend on the feature. How is that trouble? Features are additive and global, so if anyone in the dep tree requires it, it's permitted.

I don't want someone who enables serde to also enable the entirety of the parsing infrastructure, which is quite significant.

Serialization redefines what it means to be "global". e.g. There might be a client/server pair built separately, but only one of them gets the feature added, perhaps indirectly from a dependency.

2 Likes

That's fair. If there is a way to do this without enabling the full parsing infrastructure, I'm all ears. Parsing is stupid difficult to say the least.

I don't know how to make that work automatically, but it would be safe with an explicit opt-in at the source level, like a #[serde(with = "...")] helper as I mentioned in my first reply.

Anything that would work with #[serde(with = "…")] (aside from trivial things like the existing timestamp format) would still require the parsing feature to be enabled. This is the case with the pending pull request that makes those annotations super easy to create.

Right, but that's the opt-in. Anything that doesn't use the attribute will use the binary format regardless of the other features, so it will be compatible between different builds.

Ah I understand what you're getting at now. It would be a breaking change to pull support for this now, but I'll make a note of that for a future breaking release.

2 Likes

Maybe when time-rs/time#400 is resolved, there will be fewer reasons not to roll the serde-human-readable functionality unconditionally into the serde feature?