Hi, I have trained a Transformer model using OpenNMT-tf V1. As always I have used SentencePiece for tokenization. This time when applying SentencePiece I used the “–output_format=id” option for encoding (in the past I always used --output_format=piece").
For file translation via onmt-main I apply SentencePiece encoding & decoding in pre- and post-processing and that works fine.
However, when serving my model using the nmt-wizard-docker my output from the server is a numerical value instead of text (which does not resemble a sequence of SentencePiece id’s.)
As other models for which I have used “–output_format=piece” have been served correctly, I am asking whether the SentencePiece encoder/decoder working behind the scenes in the nmt wizard is able to handle these numerical id’s.
If I understand correctly, you trained your model on IDs instead of pieces? If yes, that will not work as nmt-wizard-docker always encodes pieces.
Thanks @guillaumekln. I’ll retrain with pieces. I suspected that was the problem but now I know I won’t be doing it again