It's true that the unsafe
keyword may make you think more carefully. But writing unsafe
Rust correctly can often be even more difficult than writing C/C++ correctly. In general, unsafe
Rust code is not safer than C/C++.
thanks!
thanks, very informative
Rust pointer provenance might be harder, but it might not be. C++ has TBAA, and requires you to use union
for any type punning. Rust has exclusive &mut
references, and requires you to use UnsafeCell
for aliased mutable references. I wouldn't feel comfortable claiming that either of them have a safer aliasing model than the other; they're too different.
But C++ makes a lot of other things UB that Rust just doesn't:
-
In Rust, the shift operators produce arithmetic overflow if the right operand is out-of-range, which means evaluation of the expression may result in a panic, [or] the resulting value of the expression may be truncated. In C++, the behavior is undefined if the right operand is negative, or greater than or equal to the width of the promoted left operand.
- EDIT: There was a proposal to make basic arithmetic less dangerous in C++, but apparently it wasn't adopted? Everyone should pass
-fwrapv
to their compiler and lobby their vendor to make it the default. MSVC already does the right thing here.
- EDIT: There was a proposal to make basic arithmetic less dangerous in C++, but apparently it wasn't adopted? Everyone should pass
-
Rust does not make it UB to overflow a conversion from float to integer. In C++, the behavior is undefined if the truncated value cannot be represented in the destination type
-
Implementation coherence, the specifics of name mangling, and the design of Cargo[1] for Rust prevent you from writing crates that conflict when you link them together. In C++, violating the One Definition Rule is ill-formed (no diagnostic required), and doesn't necessarily require you to do anything outlandish.
-
C++ says, on the topic of resolving function names with templates, "If a dependent call ([temp.dep]) would be ill-formed or would find a better match had the lookup for its dependent name considered all the function declarations with external linkage introduced in the associated namespaces in all translation units, not just considering those declarations found in the template definition and template instantiation contexts ([basic.lookup.argdep]), then the program is ill-formed, no diagnostic required."
If a dependent call ... would find a better match had [the matching symbol been
#include
d] ... the program is ill formed, no diagnosis required?!I'm pretty sure this is just more 1DR stuff. You can wind up with a template function that generates different code, even though both "definitions" are syntactically identical, because different symbols are
#include
d when they're instantiated. You're supposed to avoid this by not defining symbols in other libraries' namespaces and defining all the overloads of a function in the same header file so that you can't import a subset of them.Still. Rust's module system just doesn't have this problem. It has de-facto overloading through traits and method calls, but because it stores generic code as MIR with all the names resolved (and macros expanded), passing the same generic parameters to the same type function will always produce the same code, regardless of which dependent compilation unit is doing it.
- I think modules are supposed to fix this, but most C++ projects don't use them yet, and switching over to them is not "add
-fwrapv
to my CXXFLAGS"-easy.
- I think modules are supposed to fix this, but most C++ projects don't use them yet, and switching over to them is not "add
-
Rust doesn't have constructors, which are a serious can of worms. C++ does.
In particular, constructors can invoke methods on
this
when the object isn't done being initialized yet. This is the sort of thing the C++ Core Guidelines can help you work around, but it's not actually that easy to check, and if you wind up calling a pure-virtual, you get UB.
This system is complex enough that I would not be terribly surprised if someone found a bug that allowed them to violate the 1DR in safe Rust. To make this happen, you would need to find either a flaw in the coherence logic or a hash conflict in the disambiguator. âŠď¸
Why are praising the gods for something that haven't happened? You are linking to p0907r0, but, of course, it was never adopted. What was adopted is p1236r1 â wonderful combo which managed to both declare signed intergers as being two's complement and added very prominent clarification that âoverflow for signed arithmetic yields undefined behaviorâ.
Wow, thank you. I badly misunderstood what I was reading...
At least half that should just be "implementation specified" (which presumably would then be "whatever the hardware does" or "linker error" or "one definition selected" or something)
This really should be valid:
std::optional<uint32_t> left_shift_checked(uint32_t value, uint32_t shift) {
auto x = value << shift;
return shift > 32 ? std::nullopt : std::optional{x};
}
C++ be crazy, yo.
The problem with that is âwhatever the hardware doesâ is not uniform on x86-64
platform⌠and, for better or for worse, that's one of the most important ones.
SHL, scalar version, uses low 5 bits, but PSLLx, SIMD version of the same, doesn't do that.
Existing wording allows easy transformation between vector and scalar code⌠and, as usual for C++, it takes priority over ability to write code that, you know, works.
True, but AFAICT the distinction between "implementation specified" and "undefined" behavior from the specification's point of view is mostly just that it doesn't invalidate the whole program.
It does say that the behavior must be specified, but even "produces an undefined value" is still "specified" - here the hardware can produce a few different values so the implementation can either just say that, or declare the flags to get a certain value, or whatever they like, as long as it's some value.
Are either of those feasible?
In a Rust debug build, the equivalent code will panic, because of the overflow, before reaching the ternary. C++ could just declare it to wrap even in debug builds, but that can mask actual bugs.
I don't think it's quite as easy as allowing an arbitrary definition to be chosen. That mess I described in point 4 can cause a struct/class to have a different layout between compilation units, which makes functions that operate on the "same" data type incompatible.
Fair enough, but both those behaviors are better than "it always returns a filled option, and probably a bunch of really terrible other things"
Right, data layout makes this a mess. I guess "pick one" is the current undefined behavior in practice, and to become "unspecified" it could only realistically be done by requiring link time errors, which is better than nothing I suppose but would never happen.
There are no such thing as âimplementation specifiedâ. And implementation defined is exactly what it says on the tin: behavior that depends on the implementation and that each implementation documents.
Documenting either of these two possibilities would mean that either scalar form would have be to slow (and it's actually used much more than SIMD version) or people who are using SIMD would whine about âstupid optimizer that couldn't do a simple thingâ (because they are using SIMD to go fast, and not to see explicit extraneous operations added by compiler).
âUndefined behaviorâ nicely solves that problem: now developer have to to write code that doesn't trigger that behavior⌠problem solved!
Oh, sure. The only problem is: if an indeterminate value is produced by an evaluation, the behavior is undefined, thus it would return us back to square one except worse: people would see that value of shift is âimplementation definedâ and wouldn't even think for a second about the fact that now each implementation may declare your program invalid, individually!
I was actually confusing the name of "unspecified behavior" with "implementation defined".
And it's perfectly ok to define the behavior as producing an unknown but valid value (which looks fairly close to this new fangled "erroneous value"), or it could say "wrapped under -fwrapv, otherwise either wrapped or 0" or whatever else they like. As I understand it, there's very little restriction on unspecified behavior, and only whatever limits an individual case of implementation defined behavior cares to give for those.
I have no idea why C++ has "slightly delayed UB values" in the spec, that seems... odd? I suppose it's for references to funky virtual/hardware memory, LLVM's undef, basically?
It's the opposite: LLVM have these to represent âslightly delayed UB valuesâ.
Nah, that's perfectly normal. Rust have these, too. It's a bit more format in Rust, but MaybeUninit is, essentially, this.
That's what you have a piece of memory before you'll write something sensible in it. If you define variable then it should contain some value even it's not initialized and C++ doesn't have Rust-style ownership system which would make it impossible to access variable that exists but is not yet âready to useâ.
Okay, I think I've managed to construct a working test bench for this, but I can't even find a sanitizer that warns on it?
At this point, I'm just curious. Is there a way to turn 1DR violations into warnings?
Sanitizers detect UB at runtime. Here nothing suspicious ever happens at runtime, violation happens at link-time.
I'm pretty sure if you would mix together object files produced with different versions of rust compiler (or even hand-picked one from one version if some data layout fuzzing would be used) you may achieve similar UB in Rust, too.
Only linker may detect it, in theory, but linkers are, mostly, pretty dumb greatures and they don't know how to detect language-level UBs.
I was under the impression that C++ specs before C++26 didn't have an equivalent section to decl.attr.indet
? E.g. they would simply declare a read of uninitialized memory insta-UB and be done? That doesn't seem to match cppreference though, which doesn't make define it conditionally on a spec, but perhaps that's due to it being a backwards-compatible DR or something?
I might have misread the spec (the wording is pretty squirrely): it initially read to me that only the first read of dynamically allocated memory produces an undef
value rather than UB (nice!), but then any read of that would be UB, which is very weird, but reading closer it seems at least the intent is that any "move" of the value that doesn't perform an operation on the bits simply moves the undef
, while any other use is UB, which makes some sense:
// yay, no UB!
auto p = new uint32_t; // *p is undef
auto x = *p; // x is undef
return rand() % 2 ? x : 3; // undef
Essentially, you could read it as everything is implicitly MaybeUninit
now, with an implicit .assume_init()
on operations that actually need the value?
By this reading, it should be fine to return even indeterminate value from an overly shifted value, even if "erroneous value" feels more natural to me.
Sure, but you have to defined âuninitialized memoryâ, somehow. How could they specify that?
Yes. So there was no way to create a variable that can be uninitialized, yet safely accessed. But âindeterminately valued objectsâ are part of C90, not C++98 or C99.
Yes, âindeterminately valued objectsâ have been made less toxic in new versions of the standard. But still there are finite list of exceptions where you may access them.
Only if that particular use would be added to said finite list. And, apparently, no one cared enough to vouch for that.
Eh, "object which has not been initialized or assigned to" works fine AFAICT when you didn't have a "floating" indeterminate value definition and it's always immediate UB to read an uninitialized object.
That list seems to be uses of an undef value, not sources, but yes, at the moment the sources are explicitly listed in the definition (uninitialized dynamic memory, the using the attribute), so it would have to be added there, or an equivalent like "arithmetic operations that result in overflow"
My team is writing high performance simulation software. Our code is mostly Rust but has bindings to highly optimized C code. We have divided or codebase so that extremely performance sensitive code is C, but the mathematically complex code is written in Rust. For us that is a sane trade off as we can have devs who's main skills are in math and physics work under the protection of rust and only a small part of our workforce working in C. That gives us code which is both fast and secure.
I think you cannot have the cake and eat it. The added security of rust is not completely free, but I think many people are fooling themselves when thy think their application demands that tiny performance edge. In most cases performance can also be bought in terms of hardware and bugs cost money.