Some question about literals

Hello,
Thank you so much for your reply.
So, which one is non literal?
Can you show me an example of non literal for both integer and string?

1 Like

A literal is something that is literally written that way in the source code. So 81 + 45 is an expression literal, made of two integer literals (81 and 45) and the + operator.

"Hello world" is a string literal; let foo = "Hello World"; makes foo a variable whose value is the same as the string literal "Hello World". However, foo is not a string literal itself, since it's a variable, not literal text from the source code.

1 Like

Hello,
Thank you so much for your reply.
In the following code, is 114 literal or non literal?

fn main ()
{
	print!("Sum is = {}", 80 + 34);
}

114 isn't in the following code at all.

2 Likes

A literal is something that is spelled out verbatim in the code. Any other value resulting from some computation is not a literal.

1 Like

Hello,
Thank you so much for your reply.
Please show me an example of a non literal.

@hack3rcon Read Literal expressions - The Rust Reference first, especially the first line

A literal expression is an expression consisting of a single token, rather than a sequence of tokens,
that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule.

And the syntax of literals:

Syntax
LiteralExpression :
CHAR_LITERAL
| STRING_LITERAL
| RAW_STRING_LITERAL
| BYTE_LITERAL
| BYTE_STRING_LITERAL
| RAW_BYTE_STRING_LITERAL
| INTEGER_LITERAL
| FLOAT_LITERAL
| true | false

1 Like

“literal” is a syntactical term. Nothing more. It talks about Rust syntax. In case you’re not too familiar with syntax, here’s some illustration of how the compiler would start reasoning about the syntax of your program.

fn main ()
{
	print!("Sum is = {}", 80 + 34);
}

As a first step, tokenization, breaks up this program in a list of small sections, called “tokens”, essentially the smallest parts of Rust syntax that cannot be broken down any further much more, and that might have special rules of interpreting input characters, like escaping in strings, or making sure numbers have no more than 1 decimal point etc… Actually, we might have skipped a zeroth step that doesn’t apply here, of removing comments from the code. Anyways… just listing all tokens would look like

fn
main
(
)
{
print
!
(
"Sum is = {}"
,
80
+
34
)
;
}

Rust then gives some basic structure to these tokens, by grouping matching parentheses (and erroring on unmatching ones) resulting in sort of a tree-structure like

fn
main
(
)
{
    print
    !
    (
        "Sum is = {}"
        ,
        80
        +
        34
    )
    ;
}

Or you might think about it graphically as something like this:

At this point … well really at the point we created the tokens in the first place, every one of these tokens also is some type/kind of token, and we already talk of things like “integer literal token” or “string literal token” or “identifier/keyword” (including things like fn or main) or “+ operator token” (I’m not sure if there’s a standard name for all of these). See also the reference page about tokens.

The next step is turning these tokens into an actual Rust syntax tree, i.e. usually a data structure in memory called AST (abstract syntax tree). The precise way this thing looks like might be a bit of an implementation detail of the language, but the basic structure that Rust syntax itself must follow is documented relatively thoroughly on the various pages of the rust references chapters “6. Items”, “7. Attributes”, “8. Statements and expressions”, “9. Patterns“, and their sub-pages. The code in question will e.g. contain a function item, or a macro invocation statement.

Here’s a very rough graphical representation of how you can thing of the final syntax tree.

I tried to retain all syntax from the actual source code, but many of these “(”s or “!” are actually not really relevant anymore to makes sense of the syntax tree. A matter of taste of how you want to illustrate this stuff.

Even at this point, regarding 80, +, and 34, all we have are 3 tokens. The thing that groups them together will only happen once the build-in macro print will interpret these tokens, judging by the ,-delimited arguments that these shall be interpreted as expressions and then feeding them into the same mechanism that usually parses expressions in other places like let x = 80 + 34;. Again it’s a matter of taste how you want to represent this. In the above style I could draw such an expression as

Download (2)

But following a more detailed approach of annotating what type of expression / literal, etc… the compiler thinks it has and giving up representing everything literally, you could also think of it as something like this, maybe?

Download (3)

(The labels here are not intended to reflect any true terminology necessarily. I heavily used ChatGPT for generating these illustrations.[1])


Anyways… integer literals are nothing but these little circles or boxes in a tree structure that exists for some time during compilation. They are often turned into constant values included in the final binary, but that’s not necessarily the case. E.g. the string literal token "Sum is = {}" gets only interpreted at compile-time, and the {} part will never make it into the final program. Similarly, a program containing something like let x = 80 + 34; might become optimized to figure out the answer is 114 at compile time and then that value might become part of the final program instead; or if x is never used, the whole computations and it input values and thus the values of the literals 80 and 34 may never make it into your program.


  1. But no chatbots are involved in the writing of any of this answer’s text. ↩︎

9 Likes

Even more, any value is not a literal. The value of the literal 80 is not the literal 80 itself, as literals are not values but syntax.

Literals are part of Rust syntax, more precisely, they are a kind of token, as well as a kind of expressions. There are other kinds of tokens that are not literals, such as “x” or “main” which are identifiers or “+” which is punctuation, or an operator, depending on what you want to call it. And there are expressions that are not literals such as … well “x” again (a “path expression”), or “x + y” (a binary operation expression containing two path expressions), or “80 + 34” (another binary operation expression), and e.g. this last expression contains 2 literal expressions.

Talking about integer values, which is a (mostly) run-time concept: An integer value e.g. of the type i32, is any integer between a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). These are furthermore often represented in the physical computer using a 2's complement binary representation in 4 bytes of memory, or perhaps in a register. Notable for every integer value, there also exists an integer literal that you could write into your program that evaluates to this integer value.

3 Likes

Hello,
Thank you so much for your time.
To be honest, I can't understand what is literal and what is non literal? Can you show me a simple code with both of literal and non literal?
I found:
A literal is a source code that represents a fixed value and can be represented in the code without the need for computation.

Literal is a token, literally representing some value. Non-literal is, well, anything else.

3 Likes

In the code:

let foo: usize = 88 + 104;

88 + 104 is a literal expression, consisting of the integer literals 88 and 104, and the literal operation +. You can tell it's a literal, because it's written out in exactly that form in the source code.

When the compiler encounters this, it will turn the literal 88 into the integer value 88, the literal 104 into the integer value 104, and it will determine that the literal + means the addition operator. It may then determine that the variable foo has the integer value 192, but that's not a literal, because I cannot find written out literally in the source code - it's the result of converting the literals to values, and then operating on those values.

1 Like

As mentioned in my answer above, literal can refer to a kind of token and a kind of expression.

Examples of something that is not a literal could thus be

  • tokens that are not literals
  • expression that are not literals

Of course you could also discuss other things that are neither tokens nor expressions. For examples trees and rivers, thoughts, people, dogs, colors, boats, computers, languages, protons and electrons, chinese characters (i.e. the concept of them; as char literals can contain chinese characters), … really this world has a lot to offer of what isn’t a literal :slight_smile:

Tokens that are not literals: Well, let’s look at the list of tokens given above

fn
main
(
)
{
print
!
(
"Sum is = {}"
,
80
+
34
)
;
}

Now as for which of these are literals which which aren’t

fn            <- NOT a literal
main          <- NOT a literal
(             <- NOT a literal
)             <- NOT a literal
{             <- NOT a literal
print         <- NOT a literal
!             <- NOT a literal
(             <- NOT a literal
"Sum is = {}" <- a string literal
,             <- NOT a literal
80            <- an integer literal
+             <- NOT a literal
34            <- an integer literal
)             <- NOT a literal
;             <- NOT a literal
}             <- NOT a literal

So there’s a lot of tokens that are not literals.

As for expressions that are not literals, the example code is probably bad. I gave examples already above, but maybe let’s do another one, based on some new code example. First note that expressions are more complex because expressions can contain other expressions. Listing all expression in some code feels a bit like these math puzzles where you have to do things like:

Anyways, here we go: For a (partial) code example like

let x = 1;
let y = 2;
let z = 3 + 4;
let a = (x + 1) * 42 - (z + z);

all the expressions are

// in the first statement starting with “let x = ”:
1
// in the second statement starting with “let x = ”:
2
// in the third statement starting with “let z = ”:
3 + 4
3
4
// in the fourth statement starting with “let a = ”:
(x + 1) * 42 - (z + z)
(x + 1) * 42
(x + 1)
x + 1
x
1
42
(z + z)
z
z

In these, we have

// in the first statement
1                      <- a (integer) literal expression
// in the second statement
2                      <- a (integer) literal expression
// in the third statement
3 + 4                  <- NOT a literal expression
3                      <- a (integer) literal expression
4                      <- a (integer) literal expression
// in the fourth statement
(x + 1) * 42 - (z + z) <- NOT a literal expression
(x + 1) * 42           <- NOT a literal expression
(x + 1)                <- NOT a literal expression
x + 1                  <- NOT a literal expression
x                      <- NOT a literal expression
1                      <- a (integer) literal expression
42                     <- a (integer) literal expression
(z + z)                <- NOT a literal expression
z                      <- NOT a literal expression
z                      <- NOT a literal expression

So there’s a lot of expressions that are not literals.

3 Likes

I would disagree with this wording, as explained in my previous answer(s). Instead:

88 + 104 is a binary operator expression, consisting of the integer literals 88 and 104, and the binary operation + (which is some punctuation token on the token level[1]). You can tell it's not a literal expression, because it’s longer than one token, whereas literal expressions consist of only one token (which must also be a literal, not e.g. punctuation, an identifier, parentheses, or the like…).


  1. I suppose calling it merely “punctutaiton” is the most sane approach, due to its multiple use-cases, as it also appears as a delimiter in trait bounds like T: Hash + Eq ↩︎

2 Likes

Going back to the original question, we can conclude we have a little bit of a category error, and imprecise terminology, both of which make the question hard or impossible to answer well. What’s the difference between a friend and an adjective?

I’ve talked about “string literals” and “integer literals” above, but not about anything like a “literal integer” or a “literal string”. What an “integer” is also depends on context. What are we even talking about? If we want to show stuff in a Rust program’s source code, we could discuss expressions. Certain expressions “are” “integers” in a sense, as expressions have a type (though that type might even be generic, and generally it’s also context-dependent… so expressions completely in isolation can also be hard to reason about), and that type of the expression might be an integer type (of which Rust also has quite a few). We could also discuss values. The run-time value, as I mentioned before, of a literal, will typically result in some constant in the program, and then by transfer (and abuse of terminology) one could start talking about “literals” whilst really meaning “compile-time constant”. The problem with such a discussion, besides the potential for abusing terminology, is also that it’s often irrelevant. In the presence of compiler optimizations, it’s typically irrelevant whether you have 3 or 1+1+1; except for cases where syntax is relevant, which will – besides discussing the Rust reference / grammar / and perhaps compiler – come up in practice mainly only with macros, hence the discussion about print and its format string being a “string literal”[1].


  1. except when it’s not… as you also saw discussed above ↩︎

2 Likes

I think I sense a cause of some confusion in that statement.

If I write this in my source code:

let x = 2 + 3;

Then I have two literals in the source code the "2" and the "3". I guess these are called "integer literals".

We might expect a simple compiler to actually put those numbers into the binary executable it generates. Along with an instruction to add them together and then set x´ to the result, 5`.

However it is almost certain that the compiler will notice that this is the addition of two numbers that never change. So the compiler can do the addition at compile time, yield the result 5. And then it can put that 5 into the binary executable along with only one instruction to initialise x with it. This will likely save on instructions and execution time.

We have two integer literals in our source code but neither of them end up in the executable. But we have a "literal expression" (the "2 + 3") that can be evaluated at compile time and the result then appears in the executable.

2 Likes

Its technical name is not really “literal expression” AFAIK; anyways… Rust’s stance on the fact that this can be evaluated at compile time however is in fact so strong that even code like this compiles:

fn f() -> &'static i32 {
    &(2 + 3)
}

though discussing the mechanisms at play here would be a bit off-topic :slight_smile:

In a sense, 2 + 3 (more or less) fits a broader intuitive (and different from this definition) notion of “literal expression” that would especially include things like (), None, or Some(42), [1, 2, 3], etc… even though syntactically, those are technically called a “tuple expression”, “path expression”, “call expression” and “array expression”, respectively.

FYI it's easy to know the token tree of your code via syn which is a lib to parse Rust source code from a stream of tokens into a syntax tree. Luckily there is an online tool for convenience: Rust AST Explorer

Copy & paste print!("{} + {} = {}", 34, 80, 80 + 34); into the input box in the website page, and you'll see exactly which is called Literal

tokens: TokenStream [
    Literal {
        lit: "{} + {} = {}",
    },
    Punct {
        char: ',',
        spacing: Alone,
    },
    Literal {
        lit: 34,
    },
    Punct {
        char: ',',
        spacing: Alone,
    },
    Literal {
        lit: 80,
    },
    Punct {
        char: ',',
        spacing: Alone,
    },
    Literal {
        lit: 80,
    },
    Punct {
        char: '+',
        spacing: Alone,
    },
    Literal {
        lit: 34,
    },
]

There are two kinds of Token, Literal and Punct, in "{} + {} = {}", 34, 80, 80 + 34.

`print` is an Ident, but syn shows a complicated structure which you can just ignore for now
                mac: Macro {
                    path: Path {
                        leading_colon: None,
                        segments: [
                            PathSegment {
                                ident: Ident(
                                    print,
                                ),
                                arguments: None,
                            },
                        ],
                    },
2 Likes

But the data structure in syn is usually complicated.
So for literals only, use decl macro instead for simplicity. Rust Playground

macro_rules! literals {
    ($($literal:literal)*) => ();
}

literals! { // append tokens that you think are literals by line
    -1
    "hello world"
    2.3
    b'b'
    true
}

If you use no-literal tokens, it won't compile

error: no rules expected the token `,`
  --> src/lib.rs:11:5
   |
1  | macro_rules! literals {
   | --------------------- when calling this macro
...
11 |     ,
   |     ^ no rules expected this token in macro call
   |
   = note: while trying to match sequence end
2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.