Blank output in translation prediction

JptoEn · July 25, 2021, 8:44pm

Hi, although I am generally very impressed with the results of my SentencePiece encoded Transformer model, I have an issue where some of my sentences lack any prediction even tthough the model converged well.

My workflow was to pre-tok my raw data with Moses, then split into train/validate/test and train a SentencePiece model on the training data. Is that correct? Also I seem to have some extraneous tokens like &apos and &quot which I will try to fix by not using Moses.

Best Regards,
Matt

SamuelLacombe · July 26, 2021, 3:05am

Have you tried to put the option to have <unk> (both sentence piece and opennmt)? Often when the words are rare the model will prioritize <unk> which is “nothing” if you don’t have the option enable… So that could explain a blank prediction.

JptoEn · July 26, 2021, 6:29pm

Thanks Samuel I will check that out.

miguelknals · July 29, 2021, 10:56am

Hi, my linux box is out of my reach now. But you can search in the forum, someone asked a similar question (no translation), and the answer was to specify a minimum lenght when you translate with translate.py. If i am not wrong is

–min_length nnn

, where nnn can be for instance 2.

In my case, these solved the problem (only very rare instances of blank translation after that)
I guess is this what you were asking and also is valid for your translation flavor… If not excuse me.
Have a nice day!
Miquel

JptoEn · August 3, 2021, 12:05pm

Hi I assumed that paramater was unsupported in PyTorch, so I moved my whole project over to Tensorflow, but I still get this error:

onmt-main: error: unrecognized arguments: --min_length 2

However I don’t seem to have any missing predictions with TF, also my Validation perplexity is better.

miguelknals · August 3, 2021, 8:53pm

A little bit wierd. The command I use inside a shell script:

 onmt_translate --replace_unk --min_length 2   -gpu 0 -mode $var \
         -src $src \
         -output $output

Again, not sure why you are getting this error.

JptoEn · August 3, 2021, 10:22pm

I thought replace unk was only valid for LSTM models? I’m using Transformer base.

francoishernandez · August 5, 2021, 1:33pm

You can check the docs for available options: Translate — OpenNMT-py documentation

JptoEn · August 6, 2021, 10:43pm

I was missing the underscore! Stupid mistake…

Which framework is it recommended to build production systems with?

francoishernandez · August 23, 2021, 10:27am

Both. But for inference you might want to have a look at CTranslate2.