More optimizing about char boundary cache

I'm writing code for compiler for yet another programming language. The code can be read at my GitHub repo. I'm working (in PR) to optimize around char sequence operation. The cargo-flamegraph indicated that there are some serious bottleneck. The following figure is its output (with RUST_LOG="warn" CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph -p origlang-cli -F 10000 -- execute --input-file ./compile/positive/perf/very_long_string_literals_printout.origlang >/dev/null in repo root):

I'd like to ask help about optimizing "char boundary cache", speficially:

  1. Should I avoid use of Vec<char> for both memory efficiency and cpu efficiency?
  2. Should I cache their boundary if there are a lot of call to str.chars().nth(_) to avoid overhead of core::str::validation::next_code_point?
  3. How can I optimize origlang_compiler::chars::boundary::MultiByteBoundaryAwareString (and its constructor)?
  4. Can I avoid those page-faults (the perf was taken on Linux machine)? If so, how can I?

Thank you :slight_smile:

What I would do is to avoid using the index of chars and instead use the index of bytes. This would allow you to directly index into the &str in O(1) without the slow downs of iterating the whole &str with str.chars().nth(_) nor the slowdown of storing all the offsets in a Vec.


Hmm, I'll give it a try...

Thanks for your suggestion! It avoids most page-fault. Entire execution is twice faster!
Here's new flamegraph: