"Should" in the sense of "I'd certainly expect it to be" and "it's good practice". Not in the sense that "it probably is" or "it can't possibly be misaligned".
Of course you can't force any arbitrary byte buffer to be aligned to 2-byte boundaries. However, I was arguing that when producing a buffer intended to hold UTF-16 data, one should ensure that it indeed is, because its semantics and the probable use of its contents most likely require that, or at least work best if it is aligned.
It's not hard to do, either: in the worst case, by allocating one more byte than necessary, it's always possible to slice the resulting allocation so that its starting address is 2-aligned. However, most allocators already return 8 or even 16-byte-aligned buffers anyway.
I don't follow your argument about the UTF-8 encoded header with an odd length. A UTF-8 byte sequence is not a valid UTF-16 sequence of 16-bit integers. When one reinterprets
b"xy" as a (single-element) sequence of 16-bit integers, one does not obtain the UTF-16 encoded representation of the string