thanks for your insight!
However, I think that serial decoding can work too, although for sure wont be as efficient as parallel decoding.
Think that you can mix the decoders information after translating each beam from a batch, so you will have the information from all of your models to decide which translation is best for a concrete beam. Also, you will be able to (somehow) feed your decoders with this information to make the next beam search step.
Parallelizing the process we will win on speed 'cause each model will translate the beam at the same time instead of one after the other. But my guess is that we will combine the information from the models in a similar way as in the sequential decoding .
As I said, I haven't implemented this already but this is the idea I have in mind, although I am still working on how to do it