Constrained generation when inferring

Andrewlesson · July 6, 2022, 12:25pm

Can you support constrained generation when inferring? For example, I know that the format of the final generated result is as follows:
format：我们将要面临着 MASK 的考 MASK 与评价
if the whole sentence is regenerated, it may lead to the final result error. I only need to generate part of the MASK above, and I think the rest is right, and I think this method may also speed up the generation.
target ：我们将要面临着神的考验与评价

guillaumekln · July 6, 2022, 3:55pm

If MASK is always a single token, this can be done using a target prefix (option --tgt_prefix in OpenNMT-py and target_prefix in CTranslate2). For example:

Pattern: 我们将要面临着 MASK 的考 MASK 与评价

translate with prefix 我们将要面临着 and generate only 1 additional token:

=> 我们将要面临着神

translate with prefix 我们将要面临着神的考 and generate only 1 additional token:

=> 我们将要面临着神的考验

concatenate the suffix tokens:

=> 我们将要面临着神的考验与评价

You can probably generalize an algorithm from this example.

Note that to generate only 1 token, you can configure max_decoding_length in CTranslate2 (you can set it to len(target_prefix) + 1).