w\Why is it called "unit"?

I'm trying to be methodical about grokking Rust, plodding through Rust ByExample, and I'm finding it utterly helpful (after 50 years of C!). I've just reached Lesson 4 Variable Bindings where the unit value is introduced. I've seen it in other tutorials and references, so I know what it is, but now I have to ask: why call it the unit value? I think if I'd been in the room I would have suggested either "null" or "empty tuple", or something. If etymology bores you, I apologize.

1 Like

I don't know the origins either, but Rust certainly wasn't the first to use this name: Unit type - Wikipedia

More directly, Rust has taken a lot of influence from OCaml (the very first Rust compiler was written in it) and judging by the linked article, OCaml has a "unit" type, too. They even name the type unit and only the value looks like an empty tuple, () (but other languages, like e. g. Haskell, would also be precedent in having both the type and the vakhr be written () and only call it "unit" when you talk about it.)

8 Likes

A "unit type" is any type that has only one valid value, and that value is called the "unit value." Etymologically speaking, these are closely related to "unity," meaning "one".

"Empty tuple" is also correct, and arguably more specific: It's possible to define other unit types in Rust with a declaration like struct MyUnitType;. Because () is the only unit type defined by Rust, it's the default one that gets referred to in text; others will be referred to by name or otherwise made clear in context.

"Null" usually refers not to a unit type, but to a sentinel value that is only one of the valid values of another type, usually a pointer. There's never a need to check the value of a unit type, because it's always the same. A type with a null value, on the other hand, needs to be checked often to handle the special case that the null value is present.

14 Likes

It only has one value. Also, enums are "sum types" and the number of possible values is the sum of possible values for each variant:

enum {
    Byte(u8), // 256 possible values
    UnitVariant, // 1 possible value
}

And adding a unit variant adds one possible value. structs are analogously "product types", and adding a unit field multiplies the possible values by one.

So in some sense it plays the same role as the unit (1) in a mathematical field. An uninhabited type (which cannot be constructed, like ! or an empty enum) plays the role of 0.

(Gleaned from just seeing the terms thrown around, and not a formal study of type theory. Algebraic data type would perhaps be a place to start looking for something more formal.)

15 Likes

Because it has unitary (single) value.

The problem with "null" is that it basically means "nothing".

Note that we do have types with no values, like !, which is very different from ().

13 Likes

From what I could find in online dictionaries, the word “unit” has a meaning of “oneness”; so in the context of algebraic data types, where such a type acts like the number one, as a neutral element for products, this name is not all that surprising. Furthermore, one is also the number of elements in this type. Looking into mathematics a bit gives more clues/evidence that it's reasonable to assume that both of these facts can have something to do with why the convention emerged to relate this type conceptually to the number "one":

Comparable constructs to the unit type exist in various mathematical fields. Type theory would be a particularly closely related one (that I know less of than I would like to), but we have something like it just as well in algebra, set theory, or category theory, and probably more, where you'll encounter comparable structures with a single element. Category theory can even be used to strongly relate many of these, as the Wikipedia article on "unit type" points out, too, namely under the concept of a "terminal object".

Following this concept, the analogues in the categories of sets, or of in the category of (small) categories, are also single-element collections, though in words perhaps more typically called "singleton set" and "trivial category" respectively, so nothing involving the word "unit". However, at least for the latter, we have the quite common formal notation of just giving the (canonical) trivial category the symbol “1”, relating it even more strongly to the number one than what the word "unit" would do. IIRC, for categories, this notation is sometimes extended to any number, so that besides a zero-element one “0”, you'd also have a 2-element category “2”, and so on, so it may be motivated by the number of elements simply. But there's also concepts of product and sum categories, and the role of 1 in this context is like a multiplicative natural element, too, just like the number 1 for numbers.

For sets, the most simple singleton, i. e. 1-element set, is {∅}. Funnily enough, this is also the encoding of the ordinal number 1, and the cardinal number 1 (finite ordinal numbers and finite cardinal numbers are both essentially the same as natural numbers, but they are rooted in set theory, and extended to include some infinite numbers, too). For cardinal numbers, this again is motivated both by the fact that this set contains 1 element, and additionally, by the concept of sum- and product-sets, which are are strongly related to cardinal arithmetic, which in turn has the set {∅}, i. e. the cardinal number 1, acting like the normal natural number 1 in that the equation 1•x = x = x•1 is generally true, i. e. it's a multiplicative neutral element.

1 Like

Yes, the notation almost certainly comes from category theory.

The cardinality (number of elements) of a cartesian product of two sets is the product of cardinalities. The cardinality of their disjoint union is the sum of their cardinalities. These operations lift the product and sum on natural numbers to the category of finite sets:

#(A x B) = #A x #B
#(A + B) = #A + #B

Two sets are declared equal if there is a bijection between them. For finite sets, it means that two sets are equal if and only if they have the same number of elements.

Under these definition, any set with 1 element {*} acts as a unit (in the sense of algebra) with respect to the cartesian product of sets. An empty set { } acts as a zero with respect to the addition (disjoint union) of sets.

These definitions are generalized to arbitrary categories, where it is possible to talk about products and sums of objects, as well as unit and zero objects, which act as units for the product operation, or zero for the sum one. Roughly, an object A x B is a product of objects A and B if its elements naturally correspond to pairs of elements of A and B.

These definitions were later borrowed by type theory, both because they are convenient and because type theory always had close interactions with category theory (different kinds of categories naturally serve as models of type theories). From type theory, they were borrowed by closely related programming languages, such as ML, and later OCaml and Rust.

8 Likes

I guess that's what I get for choosing engineering for a career instead of mathematics. My gut-level response to "unit" is something like "volt" or "gram". but I'm okay now.

Thanks to all for the very helpful responses. "The Book" says The community is very welcoming and happy to answer students’ questions"; I expected that to be just marketing, but I was wrong.

16 Likes

Other "unit" entities in math include the unit circle (a circle centered on the origin with radius 1 (also called unit radius)); the unit interval [0, 1]; a unit vector (a vector with length 1).

The meaning of "unit" in physics and engineering is related: it’s the amount of something that corresponds to "one" in the measurement system used. People often use phrases like "mass per unit volume" when talking in general terms.

17 Likes

Daryl - This part of recent languages can seem a bit daunting, but it worth it's while; Almost all current incarnations of languages - from PHP to JS tries to infer type values automatically, It's a very interesting change which moves them more into the space of functional languages. However it a very different way of thinking. Rust overlaps in its design to a large degree with the the thinking of say Haskell in trying to make it type safe. "Unit" is cardinal because it's a defined value for nothing - or the empty set. It's not the same use case as Option(None) which has a value, namely None. Bottom T is needed when you have lazy (call by need) evaluation, e.g. take 10 of Range [1..] is well defined but the last(10) function is not.

The ability to work with lazy evaluation e.g. infinite structures, are profound when combined with "call-by-need" as it allows you to focus in in a subset of potential infinite problem space, if you are interested in this problem space "Richard Birds" book "Pearls of Functional Algorithm Design" made me sit up for what it is worth.

These languages don't really have a concept of type inference, as they don't have type checking (they are dynamically typed). You don't get compile errors for mismatching types, and every value carries its own type (to the point that JITs will often specialize and compile functions with often-used combinations of types at runtime). This is radically different from the mainstream of functional languages (ie., the ML family).

None doesn't have a value. Incidentally, it's called a unit variant for the same reasons. When you instantiate Option::None, the only "value" (piece of information) in there is the type tag of the enum itself, which isn't part of the variant's associated data (which doesn't exist).

To illustrate this, consider how None could be defined equivalently, using an empty struct variant (the empty product being unit, as usual):

enum Option<T> {
    None { /* empty product */ },
    Some(T),
}
6 Likes

In a way, that is true. But in another way, None absolutely has a value. In an Option<T>, None doesn't have a T, but it still takes up as much space as Some(t). The unit value () is a value that fits in zero bytes. In rust, None, is not a type, it is a value for the type Option<T>, and any value for the type Option<T> takes the same ammount of space.

The unit value () is of the unit type (), which has only one allowed value, the unit value. As such, no actual memory is needed to represent the value.

No, that's the wrong comparison.

In an Option that has the value None, the only thing that is needed for representing it is the discriminant. The description of the Option being of a particular variant takes up space (that's the discriminant); there is no space needed for the variant itself. IOW, you are comparing a sum type to a product type, and you are confusing the requirement for a discriminant with the space intrinsically needed for storing associated values.

The reason why an Option::None still takes up as much space as a Some is because the language chooses to represent enums using discriminated unions, ie., inline. If there were implicit boxing, for example, then None could be smaller in total size than Some. So, it's not really the None occupying the space, but the fact that it's inside an enum.

If you create an enum with unit associated values, you can observe that it takes up exactly the same space as if you declared fieldless variants. You can see this for yourself. The size is only as big as necessary for accomodating the discriminant.

1 Like

Well, that may be a better way to word it. But still, None in rust isn't something you can use without having the possibility of a Some(...). If you want to return None from a function or store it in a variable, the return type or variable type must be Option<T> for some T. While if you return () the return type must be ().

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.