My understanding is that this feature requires all the tokens from the beginning of the output up to the token we wish to change. What if we want to change a token at or near the end of the output? Then effectively we are providing the correct output and we are just asking the engine to reproduce it. Is there another mechanism or strategy so we can update a single or multiple tokens in any position?
The target prefix is a simple and effective way to know what tokens should be force decoded and where to start the unconstrained decoding.
Also I don’t see how you can change multiple positions in one request, because the first unconstrained token may completely change the rest of the translation.