Hi, although I am generally very impressed with the results of my SentencePiece encoded Transformer model, I have an issue where some of my sentences lack any prediction even tthough the model converged well.
My workflow was to pre-tok my raw data with Moses, then split into train/validate/test and train a SentencePiece model on the training data. Is that correct? Also I seem to have some extraneous tokens like &apos and " which I will try to fix by not using Moses.
Have you tried to put the option to have <unk> (both sentence piece and opennmt)? Often when the words are rare the model will prioritize <unk> which is “nothing” if you don’t have the option enable… So that could explain a blank prediction.
Hi, my linux box is out of my reach now. But you can search in the forum, someone asked a similar question (no translation), and the answer was to specify a minimum lenght when you translate with translate.py. If i am not wrong is
–min_length nnn
, where nnn can be for instance 2.
In my case, these solved the problem (only very rare instances of blank translation after that)
I guess is this what you were asking and also is valid for your translation flavor… If not excuse me.
Have a nice day!
Miquel