Can read() overflow a buffer?

I know it's a silly question, but I'm having trouble interpreting the documentation on this. First, it says:

If the return value of this method is Ok(n), then implementations must guarantee that 0 <= n <= buf.len() .

I interpret that to mean implementations of read() must not overflow the buffer. Great. But then, a few paragraphs later, it says:

As this trait is safe to implement, callers cannot rely on n <= buf.len() for safety.

I interpret this to mean that I, as the caller, cannot rely on the implementation of read() not overflowing the buffer. In other words, read() might overflow the buffer. Not great.

So which interpretation is wrong?

1 Like

Point is that this -

- is a logical requirement, but not a safety requirement. The implementation which fails to conform to this (or, to put it more generally, which returns not the amount of bytes read, but some other value instead) is buggy, but it is not unsafe, so its usage must not lead to UB.

This essentially means that, if you use something unsafe that expects n to not exceed buf.len(), then your unsafe is to blame if some implementation appears to be misbehaving and returns something else.

It can't. Well, it can, if it uses unsafe and pointer offsets internally, but in safe code it can't, since the bound checks are in place.

If some implementor of Read overflows a buffer, then they used unsafe, it's unsound, and it's their fault. (Or transitively some dependency of theirs did so and is at fault).

But they can safely return 100000 when your buffer is length 100 without actually overflowing a buffer, and you have to deal with that possibility. You can't rely on them returning a number in range to avoid UB, because it's safe to return a nonsense number.

For example.

4 Likes

The implementation can write ten bytes into the buffer and then return Ok(1000).

How would you classify that?

I think you need to read the following blog post to understand what goes on here.

1 Like

So let me see if I understand this correctly. A correct implementation of read() will not attempt to write past the end of a buffer, and will return the number of bytes it wrote. But an implementation of read() may be incorrect. It may be incorrect in that it attempts to write past the end of a buffer, in which case the program will panic. It may also be incorrect in that it does not attempt to write past the end of the buffer, but is dishonest about how many bytes it wrote. In either case, the buffer isn't overflowed as long as both the implementer and the caller avoid unsafe.

Also, what does it mean that the trait is "safe to implement?"

2 Likes

It means you don't need to write unsafe impl Read for MyType, like you have to for e.g. Send and Sync

Yes.

It may be incorrect in that it attempts to write past the end of a buffer, in which case the program will panic.

Yes, if it writes using only safe operations to do that write, then it will panic.

If it uses unsafe operations incorrectly and writes past the end, that's UB and anything could happen. But note that that is not specific to Read::read() — writing past the ends of an &mut [T] is UB in any Rust code. The things specific to Read::read() are solely about:

  • which part of the buffer is written, and
  • what the return value is.

Neither of these are extra rules the implementor must obey to ensure soundness; a safe trait isn't permitted to make such rules.

It may also be incorrect in that it does not attempt to write past the end of the buffer, but is dishonest about how many bytes it wrote. In either case, the buffer isn't overflowed as long as both the implementer and the caller avoid unsafe.

Yes, but the caller is also free to use unsafe as long as they check that n is in range rather than assuming it is.

1 Like

Here's a similar scenario from the standard library. The HashMap docs say

It is required that the keys implement the Eq and Hash traits, although this can frequently be achieved by using #[derive(PartialEq, Eq, Hash)]. If you implement these yourself, it is important that the following property holds:

k1 == k2 -> hash(k1) == hash(k2)

In other words, if two keys are equal, their hashes must be equal.

And some methods also rely on Borrow, which says

when providing implementations for additional traits, it needs to be considered whether they should behave identically to those of the underlying type as a consequence of acting as a representation of that underlying type. Generic code typically uses Borrow<T> when it relies on the identical behavior of these additional trait implementations. These traits will likely appear as additional trait bounds.

In particular Eq, Ord and Hash must be equivalent for borrowed and owned values: x.borrow() == y.borrow() should give the same result as x == y.

However, one can implement these traits with safe code that does not conform to the logical constraints outlined above.

One of the great qualities about Rust is that any unsoundness has to be due to unsafe somewhere [1]. One way to think of this is that if you never use unsafe, you can never be to blame for any unsoundness. If there's unsoundness, it's someone else's fault.

From the opposite perspective, if you rely on these documented logical constraints which cannot be checked by the compiler in your own unsafe, you're making an unsound assumption. You need to instead do something sound, even if it is unspecified, disruptive, illogical, etc. If it's possible for your unsafe to cause UB due to someone else's illogical but safe implementations, it's still your fault.

Which is why the HashMap documentation goes on to say

It is a logic error for a key to be modified in such a way that the key’s hash, as determined by the Hash trait, or its equality, as determined by the Eq trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the HashMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

It's possible to write such misbehaving keys with safe code, but HashMap has taken care not to do anything unsound.


  1. or a compiler bug or some other niche areas I'll ignore ↩︎

1 Like

There are no writes past the buffer ever. No implementation is ever allowed write past the buffer you give it. In all cases all implementations can only write within the buffer and no more.

However, the number of bytes written returned by an implementation can be inaccurate. It is a bad idea, but technically permissible, to lie about the amount of bytes written to the buffer. You're not supposed to naively trust this number, and you still have to check that it's no more than the length of the buffer (or in practice, Rust will check for you when you slice or index the buffer in safe code).

A read() implementation is allowed to read nothing and return 9999999999. Or it's allowed to fill the entire buffer but return 0. These are buggy behaviors, but they're not unsafe, because the buffer must be initialized before passing to read(), and you're not allowed to read past the buffer, so at worst you get garbage, but not Undefined Behavior.

I really wish Rust tutorials would include this information, and people wouldn't need to ask the same question again and again and again. This question is asked on this forum about two dozed times per week.

I know, I know, math. Math is scary, if we even mention that word in a tutorial people would run away screaming, but still… it's tiring to explain the same thing again and again and again.

Basically: it's impossible to answer any “interesting” question about some random syntactically correct program. Is this program which would, eventually, stop? No way to know. Is this program adding two numbers? Who knows. Nothing (except some trivial properties) can be gleaned from the program source.

And that's not “we don't have money to make it happen”. It's quite literally impossible.

And yet… people usually want to know if their program is “correct” (for some definition of “correct”) and negative answer “sorry, we couldn't do that” is usually not considered an adequate answer. Yes, that same infamous: In theory, theory and practice are the same. In practice, they are not.

Rust developers are practical guys thus they split this impossible to solve problem in two, like Alexader the Great did many years ago. Instead of picking one option: reject useful programs which may or may not be correct (like most high-level languages do) or accepting invalid program (like C, C++, or Zig are doing) Rust gives you the choice:

  1. In “normal”, “safe” Rust the compiler is in charge and compiler ensures that your program is “correct” (for some definitions of “correct”). But that, of course, means that it would be rejecting some “correct” programs (sorry, no way to avoid that). Like C# or Java only with more safety guarantees.
  2. If you use that magical word unsafe, then, oh well, you probably know what you are doing: Rust gives you the means to write anything at all (like C, C++ or Zig)… but with great power come great responsibilities: now it's your obligation to ensure that program is correct. If you fail… not only the code you wrote but also anything outside of your code, maybe millions of lines of code removed from your tiny transgression… may misbehave.

And that is where the issue with read comes from. It's safe trait which means that anyone can implement it incorrectly. Without help from unsafe code you couldn't overflow the buffer (compiler would look after that), but you can return some random gibberish as a return value (compiler doesn't know that this number is, somehow, related to amount of data written into buffer and couldn't check). That, in turn, means if you would be writing unsafe code which deals with Read trait then it couldn't rely on the fact that read returns correct value. If would need to do it's own sanity checking.

That's it.

Without knowing that dual nature of Rust the whole description sounds incredibly strange and hard to understand.

But if you know about these two “dialects” of Rust then it, suddenly, makes perfect sense.

1 Like

As this trait is safe to implement, callers cannot rely on n <= buf.len() for safety.

As shown by the OP, this phrasing (that appears throughout the docs) is confusing and appears to contradict what is said earlier.

It could be improved by making it clearer that this is only relevant to users of unsafe and that the comment can be ignored for typical use. Something like:

As this trait is safe to implement, unsafe callers must not rely on n <= buf.len() to provide safety.

1 Like

Well, you should never need to consider safety in safe code, so any comment on safety concern is for unsafe code anyway.

The point being that this is not obvious to new readers of the docs. They know that Rust is a safe language, but now they see a constraint that they appear to need to follow "for safety".

I think, with a small change, that this can be clearer to new users and not much noisier for experienced users.

3 Likes

You are right, I searched doc and find we do specify "unsafe code" on similar places like this. I created a PR for the change.

1 Like

It hard to judge because experiments in these area are very slow and tedious. Any “new” uses after visit to forums stops being “new”, after all.

Yes, it's true that text as written is not 100% clear not a “brand new” user, but I'm not convinced that anything else may clarify situation to the user who doesn't understand how Rust works.

If you understand that there are two kinds of Rust and that unsafety can be “exported” into a safe code then what is written there is perfectly understandable.

And if you don't understand that then mention of unsafe would send new users in the opposite direction: they would, incorrectly, assume, that unsafe word is some kind of magic and if they can just wrap it in the “safe” function everything would magically become correct.

:thinking: I think it's fine in this situation though? We are specifying something you should not do in unsafe code. So we are actually limiting unsafe usage instead of encouraging it.

As I have said: it's always tricky to imagine how novice would read given piece of code. The phrase must not be trusted to e.g., omit bounds checks in unsafe code doesn't sound worse than what is there now, so maybe it's a good idea to try to change that part and see if people would stop misunderstanding things.

I don't really agree that it's only relevant for users of unsafe. Blindly trusting the return value can cause bugs in safe code, even if said bugs can't cause UB.

But I agree it could be improved to better explain how it's not self-contradictory.

This sentence is specifically about safety and does apply to unsafe code specifically. It makes it clear that the caller is responsible if they assume the constraint is true.

If a trait implementation does not follow the semantics of the trait, then that can cause bugs, but those bugs are clearly the fault of the trait implementation, not the caller.

The reference says:

Traits items that begin with the unsafe keyword indicate that implementing the trait may be unsafe.

And the "Unsafety" section says:

The following language level features cannot be used in the safe subset of Rust:

  • Dereferencing a raw pointer.
  • Reading or writing a mutable or external static variable.
  • Accessing a field of a union, other than to assign to it.
  • Calling an unsafe function (including an intrinsic or foreign function).
  • Implementing an unsafe trait.

So I suppose this means that if a trait is unsafe to implement, the compiler will not try to provide its usual safety guarantees in your implementation of it. And if a trait is safe to implement, it will (up until you use an unsafe block in your implementation).

I'm still confused about this statement:

Why is this the reason unsafe callers can't rely on n <= buf.len()? Wouldn't it also be the case if the trait was unsafe to implement?