Can I set "replace_unknown_target" when I use Ctranslate2?

zealfory · December 7, 2020, 9:44am

Hi, I got <unk> when I use Ctranslate2 to infer, can I replace that symbol? Just like “params:replace_unknown_target” did.

ctranslate2 eg: “测试 ℘ 特殊符号” --> " Test <unk> special symbols"

Thanks a lot.

zealfory · December 7, 2020, 9:45am

@guillaumekln

guillaumekln · December 7, 2020, 9:46am

Hi,

This parameter is not available in CTranslate2.

zealfory · December 7, 2020, 9:53am

Does that mean I have to replace the <unk> symbol by myself ? ? But when I use original checkpoint to infer, there is no <unk>。

original checkpoint eg: “测试 ℘ 特殊符号” --> " Test ℘ special symbols"

This inconsistency is somewhat inconvenient.

guillaumekln · December 7, 2020, 9:58am

Yes, you can. The attention vectors are returned as part of the translation API, so this feature should be very easy to implement on your side.

The reason it is not implemented is because CTranslate2 only supports Transformer models which are typically not compatible with this parameter, unless trained with guided alignment.

zealfory · December 7, 2020, 10:00am

OK, I see. Thank you very much.

guillaumekln · January 22, 2021, 4:56pm

For reference, the option “replace_unknowns” was added in CTranslate2 1.17:

zealfory · January 26, 2021, 8:52am

Thanks