Const vs let in fn

If I have a fn that will run over and over again, comparing an input &str against an array of strings, which of the following options is more efficient, performance-wise? Does using let inside the fn body allocate or copy each time?

Also, what about putting const HEADERS inside the fn body? Does that make a difference?

I'm not actually experiencing perf issues, I'm just trying to learn the implications of each.

// let version
fn is_header(line: &str) -> bool {
    let headers = [
        "Timestamp\tFilename\tError\tMessage",
        "νƒ€μž„ μŠ€νƒ¬ν”„	파일 이름	였λ₯˜	λ©”μ‹œμ§€",
    ];
    headers.iter().any(|h| line.ends_with(h))
}

// const version
const HEADERS: [&str; 2] = [
    "Timestamp\tFilename\tError\tMessage",
    "νƒ€μž„ μŠ€νƒ¬ν”„	파일 이름	였λ₯˜	λ©”μ‹œμ§€",
];
fn is_header(line: &str) -> bool {
    headers.iter().any(|h| line.ends_with(h))
}

I found this related topic, but I didn't grok a definitive answer.

With optimization enabled (cargo run --release) they will almost certainly be identical.

Does using let inside the fn body allocate or copy each time?

Without optimization, both versions would definitely allocate the 2-element array for headers, on the stack, each time it is run. However, all that means is copying the 2 pointers and lengths for the string literals β€” not their text.

With optimization, the compiler will almost certainly transform the first code to be identical to the second β€” and beyond that, the array and the iteration will also go away, so that the result is just two comparison routines and no loop.

You can see the results for yourself by using the β€œASM” option in the menu in the Rust Playground, or https://rust.godbolt.org/. When I did this, I saw that the compiler actually de-duplicated the code: declaring that the two versions of is_header are two names for the same code.

Also, what about putting const HEADERS inside the fn body? Does that make a difference?

No, that just changes where the const is visible. Scope, variable names, etc. make no difference to how the program is compiled.

4 Likes

Those two are effectively identical. Read here about the difference between const and static.

But I also wouldn't expect any real difference from static in this case. All you're potentially changing is how the pointers to the str data is stored.

1 Like

Thank you for the insight. I just tried rust.godbolt too and it looks like a great tool for getting insight into how the compiler does its job!

That's amazing that the compiler can figure that kind of stuff out. :exploding_head:

I'll definitely read up on that. Thanks.

wow, this paragraph from your link just made things click so much better for me (I still have only a basic understanding, but I get this):

To put it simply, constants are inlined wherever they’re used, making using them identical to simply replacing the name of the const with its value. Static variables, on the other hand, point to a single location in memory, which all accesses share. This means that, unlike with constants, they can’t have destructors, and act as a single value across the entire codebase.

If all "header" strings are known at compile time, you might be able to squeeze more performance out of it using a non-cryptographic hash function. is_header can hash the input once, and then it's a simple integer comparison against all known header hashes with a match [1].

I wrote a proc macro for this, but never published it. [2] The "hash function" in this case just returns the first 8 bytes of the string, padded with zeroes if necessary. A hasher like FxHash might be more suitable for you. Maybe it can give you some ideas, anyway.


  1. Which LLVM may be able to optimize to a trie for comparisons. β†©οΈŽ

  2. The benchmarks test string comparisons against assembly mnemonics (67 in total) and the inputs are the top-10 mnemonics by use in a sample project, plus 3 mnemonics selected from the beginning, middle, and end of the list. On my machine, the naive string comparisons are 1.78x faster in the best case (3 out of 13 tests) and 15.3x slower in the worst case (sample at the end of the list) with the median sample being 7.97x slower. The performance with the macro is much more predictable, with 12 out of 13 tests running within 0.5 ns of one another. The lone outlier is only 2.4x slower. β†©οΈŽ

1 Like

Would you be willing to show me an example of what you mean, maybe using my example as a template? I assume this would only work for whole strings, and not partial checks like "ends_with()"?

That sounds really cool but a bit out of my wheelhouse just yet.

EDIT: I tried this, and I don't see any performance benefit over this: str_vec_of_headers.contains(&line)

// This doesn't appear faster than the above
use rustc_hash::FxHashSet;

fn is_header(line: &str, set: &FxHashSet<&str>) -> bool {
    set.contains(line)
}

fn main(){
   let set = FxHashSet::from_iter(vec![
        "Timestamp\tFilename\tError\tMessage",
        "νƒ€μž„ μŠ€νƒ¬ν”„	파일 이름	였λ₯˜	λ©”μ‹œμ§€",
    ]);

    // the part that repeats:
    let result = is_header(&line, &set);
}

The bad news is that hashing does not exactly match what your original code was doing (substring comparisons with ends_with()). If you want to do equality matches, and the list of string comparisons is larger than just 2, you can begin to see some benefit from comparing hashes instead of strings.

I put together GitHub - parasyte/headercmp-example with a benchmark as a proof of concept. These are the results on my laptop:

$ cargo bench

     Running benches/my_benchmark.rs (target/release/deps/my_benchmark-3c1cd4c74ed895c3)
is_header hash version  time:   [11.751 ns 11.785 ns 11.825 ns]

is_header const version time:   [19.987 ns 20.040 ns 20.096 ns]

It's a modest improvement, but it is quite sensitive to wordlist changes. And the way I'm computing the hashes at compile time (and include!ing) is not ideal.


Bonus: If you do enough digging, you can find some degenerate cases where the string comparisons are really bad. This run had a lot of long strings with the same prefixes (e.g. only the last 5-10 characters were different in a few dozen long strings):

$ cargo bench

     Running benches/my_benchmark.rs (target/release/deps/my_benchmark-3c1cd4c74ed895c3)
is_header hash version  time:   [10.997 ns 11.065 ns 11.143 ns]

is_header const version time:   [755.98 ns 816.51 ns 885.57 ns]

Thanks. I cloned your repo and will play around with it. Much appreciated, I think I could get away with the equivalent of == rather than .ends_with(). The reason I did it was because the files I'm working with prepend a BOM, but I can probably find a workaround for that.

And if not, this is still cool insight I'll tuck away for the future.

J

1 Like

Well, these are 30-year-old optimizations, at least. Everything in your code (the relevant parts that you are asking about, anyway) is known at compile-time, you have string literals in an array literal. There's nothing to "figure out", it's basically trivial.

There are much more interesting and complex optimizations that compilers can perform. The bottom line is that you basically NEVER need to worry about low-level decisions like this.

2 Likes

you basically NEVER need to worry about low-level decisions like this

That's awesome.

There's nothing to "figure out", it's basically trivial.

I probably used the wrong phrasing. I should have said, "that's super convenient that the compiler..." rather than "that's amazing that the compiler..." I'm loving this zero cost abstraction business. It's certainly not the case in some other languages I've used.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.