OpenNMT Forum

Transformer model is generating empty lines, when using Sentencepiece Model

I used the SentencePiece model as a tokenizer. When I encoded src text file using SPM encoder, everything is fine.

spm_encode --model=eng.model --output_format=piece input.txt --output input_tok.txt --extra_options=bos:eos

While using translation command on the model, few lines translated as empty, and it’s not that those sentences are small 1 word, phrases. These sentences were complete sentences.

onmt_translate -model -src input_tok.txt -output fr_output_tok.txt -replace_unk -verbose

I am trying English-French Machine Translation on the Europarl dataset.
Any suggestions, how to resolve this??

Can you show an example input (tolenized) that gives you an empty translation?

@BramVanroy Here is the input file after tokenization.

And this is output, after translation.

@guillaumekln please help.

I suggest to not use these options when training with OpenNMT-py. The framework already injects these tokens in the data.

I’m experiencing the same issue with transformers. In my case I’m using BPE and I’m not using bos or eos tokens. Some lines are translated as empty. Any guess?