I'd recommend writing a small program that runs each option in a tight loop 10, 100, 1000, and 10000 times each and times it and outputs the results. Compile it in both Debug and Release Mode with/without Thin-LTO and/or LTO and compare the results.
EDIT: If it matters enough to ask the question which is more efficient, it is probably worth taking the time to bench-mark. If it isn't worth the time to bench-mark, it probably isn't worth the time to ask the question. That's the way I always treat these things anyway.
Yes, benchmark it. In particular, this is going to be strongly dependent on the number of suffixes you're checking and how big your haystacks are. In the specific example you have here, I'd actually expect (1) to be faster than either (2) or (3), since your data sizes are so small. But at some point, (2)/(3) should do better than (1). Where that crossover point is can only be found by benchmarking.
I would generally expect (2) and (3) to perform the same, but I've been hilariously wrong about such things before!