Wrong output from exported OpenNMT-tf models

dmarin · January 15, 2024, 4:43pm

Hi,

I’m training a TransformerBigSharedEmbeddings model using auto_config. I created a SentencePiece vocab model of 64k tokens with defaults and full vocab coverage. The training converges nicely, as expected, and validation scores are around 50 points BLEU. I save validation predictions during training, which look good (no “unks” and most translations seem of good quality). The volume of data used is in the order of millions, so that should not be an issue.

However, there is something wrong with the export. I tried several ways to export: as OpenNMT saved model, as CTranslate2 with different quantization values, and also the automatic export to CT2 on the best BLEU. None of them produce any meaningful translation: the output is either one random token repeated dozens of times, or just a long series of parenthesis characters.

I made sure the vocab and the tokenisation were exactly the same as for the preparation of the data. I even tried to predict tokenised sentences from the training set just to make sure the tokenisation was not a problem, and the output was still no sense (even before detokenisation). I don’t get any error or warning in the logs, so it’s difficult for me to identify the cause, and I really run out of ideas of what could be wrong… Could be the export of the TransformerBigSharedEmbeddings model in particular the problem? Any clues or aspects I should consider otherwise?

The versions I’m using are OpenNMT-tf 2.32.0, TensorFlow 2.13.1 and cuDNN 8.6.0.