Constrained generation when inferring

Can you support constrained generation when inferring? For example, I know that the format of the final generated result is as follows:
format:我 们 将 要 面 临 着 MASK 的 考 MASK 与 评 价
if the whole sentence is regenerated, it may lead to the final result error. I only need to generate part of the MASK above, and I think the rest is right, and I think this method may also speed up the generation.
target :我 们 将 要 面 临 着 神 的 考 验 与 评 价

If MASK is always a single token, this can be done using a target prefix (option --tgt_prefix in OpenNMT-py and target_prefix in CTranslate2). For example:

Pattern: 我 们 将 要 面 临 着 MASK 的 考 MASK 与 评 价

  • translate with prefix 我 们 将 要 面 临 着 and generate only 1 additional token:

=> 我 们 将 要 面 临 着

  • translate with prefix 我 们 将 要 面 临 着 的 考 and generate only 1 additional token:

=> 我 们 将 要 面 临 着 的 考

  • concatenate the suffix tokens:

=> 我 们 将 要 面 临 着 的 考 与 评 价

You can probably generalize an algorithm from this example.

Note that to generate only 1 token, you can configure max_decoding_length in CTranslate2 (you can set it to len(target_prefix) + 1).

1 Like