Now that you have summoned me, I'll try to at least give some sort of answer. If you had one thousand state machines, and each state machine had a function called poll that does a little bit of work and then returns, then you could put them in a vector, iterate through the vector, poll all of them, and suddenly you're doing one thousand things on a single thread concurrently (but not in parallel).
So the way non-blocking IO helps is that when your state machine needs to do some IO: It can start the operation and return from the poll function, and while the IO operation is pending, the loop can spend the time polling other futures.