I assume the reason you're using two threads for decompression is that you want to dedicate two cores to that work, when multiple cores are available. If the consumer does a very small amount of work, using a mutex for that may be fine.
But if the consumer does significant work or blocks, the mutex will block one or both of the two threads for a significant amount of time. In that case using a queue and a separate consumer thread instead can prevent that blocking.
In other words, three threads and a queue can potentially (depending on how many cores are available and what the consumer is doing) do more parallel work than just the two threads.
The vec containing the decompressed data can be moved to the queue, without an extra copy. Allocations can be minimized by using a pool of vecs.
It is not "mutability checking" that has a cost in this scenario, and I'm not sure what you mean by that. It is blocking by the mutex that reduces concurrency and potentially parallelism.
When you're done with a Vec you can clear it and put it somewhere, e.g. a Mutex<Vec<Vec<u8>>> or a channel instead of dropping it. The start of your processing pipeline can then try pulling an empty vec with some non-zero capacity out of that pool. That way the allocation and deallocation work can be avoided and replaces with the cost of briefly locking.
This is can be worth it if the used allocator is slow or the allocation cost is significant relative to the amount of work being done by the threads.
The downside is some added complexity and that it can increase the memory footprint of your process if you're keeping more large allocations than necessary. For tiny vecs the overhead of pooling can also be higher than a good allocator.