Can you support constrained generation when inferring? For example, I know that the format of the final generated result is as follows:
format:我 们 将 要 面 临 着 MASK 的 考 MASK 与 评 价
if the whole sentence is regenerated, it may lead to the final result error. I only need to generate part of the MASK above, and I think the rest is right, and I think this method may also speed up the generation.
target :我 们 将 要 面 临 着 神 的 考 验 与 评 价
If MASK is always a single token, this can be done using a target prefix (option --tgt_prefix
in OpenNMT-py and target_prefix
in CTranslate2). For example:
Pattern: 我 们 将 要 面 临 着 MASK 的 考 MASK 与 评 价
- translate with prefix 我 们 将 要 面 临 着 and generate only 1 additional token:
=> 我 们 将 要 面 临 着 神
- translate with prefix 我 们 将 要 面 临 着 神 的 考 and generate only 1 additional token:
=> 我 们 将 要 面 临 着 神 的 考 验
- concatenate the suffix tokens:
=> 我 们 将 要 面 临 着 神 的 考 验 与 评 价
You can probably generalize an algorithm from this example.
Note that to generate only 1 token, you can configure max_decoding_length
in CTranslate2 (you can set it to len(target_prefix) + 1
).
2 Likes