Why are Option variants lang items?

I wanted to make a point to someone who I'm introducing to Rust that "Option isn't special, just a regular type provided by the std library", but when I checked I noticed that Option::None and Option::Some actually are annotated with #[lang = ] items:

https://doc.rust-lang.org/src/core/option.rs.html#161-170

I don't see them on the list of lang items at the bottom of this page:

https://doc.rust-lang.org/unstable-book/language-features/lang-items.html

So, two questions:

  • What is special about Option that requires compiler support / the lang item annotation?
  • Are Option's varients intentionally missing from that list of lang items, or is that a documentation bug?
9 Likes

This must be a new addition, because last time I checked, this was not the case. Anyway, there has been an unfortunate surge in "make everything into lang-items" initiatives recently. It looks like Result::Ok, Result::Err, and even From::from() are also lang items now :frowning:

I see absolutely no good reason for this, and usually the arguments over at IRLO in favor of making even more types/traits/functions into lang items are pretty weak, but I don't specifically remember relevant discussion around Option or From. Unfortunately, the latest commit according to git blame isn't exactly helpful for finding out why and by whom Option's variants were added as lang items.

9 Likes

I think the "blame" points to this: https://github.com/rust-lang/rust/pull/75145

5 Likes

Thanks for digging this up. Gee, they special-cased a bunch of perfectly stand-alone std types as an optimization? That seems… unreasonable, to put it nicely.

3 Likes

Are they really stand-alone if the compiler injects references to them when desugaring some syntax forms?

For instance, the pat_some function in that PR's diff that uses the Some lang item gets used here, in the lowering of for loops.

They are. They don't depend on the compiler; instead, the desugaring depends on them. Previously, the compiler literally injected full paths like std::option::Option::Some, which worked just fine, without Option needing to be self-aware of all of its use sites.

4 Likes

"Option variants are used in the desugaring of some expressions, notably for loops" is sufficient explanation for them being lang items for my purposes I think.

However in that case it sounds like they probably should be added to that page in the unstable book, right?

2 Likes

It was already the case that lang items are not just items that depend on compiler magic, but also things that aren't magic in their definition or implementation but the compiler itself depends on, such as the traits for binary operations, panic_bounds_check and start. I think the compiler always could have looked up such items by path, too, but didn't, and the reason that the items used in desugaring/AST -> HIR lowering were done differently was that the HIR refers to items in the form of paths, so resolving a hard-coded path was the easiest method. Making them lang items now means that there's a single method that the compiler uses to depend on items in std/core.

What, in your opinion, is the cost of making things lang items that isn't overcome here?

There was one issue, because crates in the 2018 edition can change what ::std means:

Admittedly that's not a big issue, and could have been resolved in some other way like forbidding std/core as names for extern crate.

3 Likes

It’s probably because I’m a bit tired or something why I’m not able to come up with any examples on my own right now... what kind of syntax desugars to Option or Option::Some/Option::None in particular? I think listing one (or all) of those would be a valuable addition to this thread.

for loops call Iterator::next() and match that return value.

1 Like

The arms of the match on the return of Iterator::next in for loops:

https://github.com/rust-lang/rust/blob/b4adc21c4fa245994b4936df5b4f7d94ca633c5d/compiler/rustc_ast_lowering/src/expr.rs#L1692-L1710

That's the only place the Option variant lang items are used that I can find (outside of clippy using them for some lints).

Since it is used in the compiler, I'm not actually opposed to it being a lang item - but I think having more specific lang items would be a better solution, for example #[lang = "for_loop_continue"] and #[lang = "for_loop_finish"]. Option itself is too generic to meaningfully say "this is inherent to the language" about.

3 Likes

In addition to the above replies about Option, From::from is presumably used in the desugar of try (?) expressions.

It's wrong to say Option shouldn't be a lang item. Even when for loops desugared to just "I hope this is the path of Option", Option was still a language item, because the language knew about it.

What might be useful, though, would be to split lang items into three (or more) distinct categories, to mitigate the "now everything's a lang item" fears:

  • "Traditional" lang items, like i32, [T; N], or Box. The language defines some semantics for the item which cannot be expressed in surface Rust, or impls on fundamental built-in types.
  • "Language knowledge" lang items, like Option, IntoIterator, or Iterator. The language interacts with the item in some fashion, but all behavior is defined in surface Rust, the language just knows about and uses the item.
  • "Rustc knowledge" lang items, like Result, maybe? There are no language reasons that the compiler needs to know about the item, but rustc wants to know it exists, for the purpose of (probably) nicer error messages or otherwise.

Lang items are better than the traditional approach of items at expected paths. Instead of the compiler assuming you provided an item, you tell the compiler where the item is, and the compiler can even easily check that the item fits its shape expectations.

21 Likes

There already is a rustc_diagnostic_item attribute. Result's variants are lang items because they're used in the lowering of try/? expressions:

I respectfully disagree. My argument isn't that "X or Y doesn't know about it". Rather, it shouldn't be a lang item because it demonstrably doesn't need to be one.

The problem with unnecessarily marking types as lang items is that it increases coupling and reduces orthogonality. Once something is a lang item, there's no return, and now the compiler gets to make additional assumptions about the type, and it can also replace it with built-in magic partly or entirely.

This in turn makes it impossible to replicate the type from user code, which was basically the most important charm of Option and Result (etc.) just being regular, library-defined types. I've recently encountered some code that replicated Cow so that users could conditionally opt out of the owned variant in a #[no_std] environment. If Cow were a lang item, it would not be possible to just define an equivalent type in order to solve such problems.

So, no, it's not "wrong" to argue these types should not be lang items. What is wrong is to dismiss counter-arguments without proper consideration of long-term effects by deferring to a blanket "it's wrong" statement.

4 Likes

I don't see how to square this with your previous "Previously, the compiler literally injected full paths [...] which worked just fine" statement.

If the compiler's injecting a full path, then there's some language feature that's hard-coded to the thing in core anyway, and thus a different type provided by the user still can't be used instead. For example, even if Option wasn't a lang item, you still can't use something other than Option in the return type of Iterator::next in a for loop.

So if it's still not possible to use an "equivalent type" with the language features, what does it matter if it's marked lang_item or not? You can still make your own Option-like enums and use them (outside of the special syntax that's hard-coded to the one known Option) if you want, just like you can with custom Cows.

11 Likes

For me, at least, the primary concern isn't using custom types for some langauge (or stdlib) feature. It's ensuring that the types remain "plain ol' Rust types". Making a type a lang item (versus hard-coding a path or otherwise indicating "this should remain a plain ol' Rust type") weakens the assurance that the type won't grow some special, language-level abilities. Such abilities increase the likelihood of one-off behaviours, often leading to bugs in and outside of the compiler, and makes the language more complicated in general. One of the benefits beyond charm is that if you understand the langauge, and you encounter a new feature that just uses desugaring and/or plain-ol-types, you can read the desugaring/types, understand them with your existing knowlege, and say "ah yes, just so".

I don't have the time to track down examples just now, so this won't be the reply I wish it was, but this does come up in issues, e.g. around Box. In leiu of concrete examples, here's an article on how Box is special. I don't want to speak for @H2CO3, but what they may have been getting at is that if I want to create my own Box-like data structure for some reason, I can't today, because I don't have access to things like DerefMove. And quoting the article:

The current status is that Box<T> is still a special type in the compiler. By “special type” I don’t just mean that the compiler treats it a bit differently (this is true for any lang item), I mean that it literally is treated as a completely new kind of type, not as a struct the way it has been defined in liballoc. There’s a TON of cruft in the compiler related to this type, much of which can be removed, but some of which can’t. If we ever do get DerefMove , we should probably try removing it all again.

7 Likes

It's not wrong to say that they shouldn't be "traditional" lang items, where the language can add special powers to the type (as is Box).

But it's (IMHO) incorrect to argue that Option shouldn't be a lang item in that you say "hey compiler, here's my Option type, that I use for all the other times you expect to see Option."

Having #[lang_here_it_is = "Option"] enum Option is strictly better than the status quo of a hardcoded path. Now, if I'm implementing my own core, I can provide my own implementation that's not called Option and isn't at ::core::option::Option, because I can tell the compiler where it is.

It's fine to say that Option should be a weaker lang item than Box is. But if the language knows about it (and it does, because it's used in the desugaring of for), then it should be a lang item, because the item is part of the language.

It can (and probably should) be a weaker lang item where the compiler only has the power to know the path and check that the type has the correct shape, and not to create special behaviors. But it's still a lang item, even if the only lang item part of it, is the language using the plain old surface Rust item.

Or IOW: "lang item" doesn't mean "I'm an item that has special language-defined behavior different from my surface appearance," it means "I'm an item that the language knows about."

You can argue that these should be made more distinct. (I agree!) However, Option clearly is the second type of lang item, and nobody (that I know of) wants it to also be the first.

16 Likes

I also want to add that Box isn't a special compiler implemented kind of type purely due to historical accident. There are still multiple reasons the compiler has special vision into the type:

  • DerefMove, of course. Being able to move out of a dereferenced box is pretty fundamental to it being practical to use, and despite multiple efforts to do so, we have yet to make a suitable language level replacement, due to two very complex intertwined reasons: partial moves and reconstitution.
  • Partial moves. Just like on the stack, if you have a boxed struct (that doesn't impl Drop), you can move out any number of its fields piecewise. The fields you don't move remain in the same place (importantly, including not invalidating references). The box stays as is on the heap, and you can move fields in and out. The compiler tracks which fields it needs to drop from where, just like with stack variables.
  • Reconstitution. If you do partial moves and then move all the fields back in, the box is whole again! Or if you just move the entire contents out, do some other stuff, then move it back in again, the box is fixed up, at the same address, and ready to be packed up and shipped. The box enters a partially moved state (or a fully moved payload state, but still with its storage allocated), and then reĂ«nters it's fully formed, well-typed state.
  • And all this is without mentioning the mostly-semantics-agnostic but still important nudging we do to try to encourage LLVM to elide copies when creating into a box. If Box::new ever elides a copy, it's because the compiler has full control over what box_alloc is doing, and knows that it can alloc before doing the actual Box::new.

Box is special. All of this specialness is required for Box to work "as it should," where it's equivalent to owning a variable on the stack, just in some other owned memory elsewhere. Could we maybe one day make Box no longer special? Possibly. Is it likely? To be honest: not really. There's a lot of tricky peculiarities here that are difficult if not impossible to express with the type system, even with a hypothetical &own.

4 Likes