Sorry if the title reads like a clickbait — it is taken from the index_many::linear_singleton benchmark. Other *_many_* benchmarks involving small vectors also observed various improvements from 12% to 37%. Notably, other length-setting benches are mostly statistically slower or equivalent to SmallVec. See the repo readme for more detailed analysis.
if you're optimizing for small vectors, maybe it would be better to have the length lsb be 0 for a small inline vector and 1 for a heap vector, that way small inline vectors may require less operations for small inline vectors since you don't have to | 1 all the time.