Translate.py with -replace_unk option and the transformer model

miguelknals · April 16, 2019, 10:20pm

Hi

In several places of the forum and internet, the “transformer” model has been praised as a better model than the default model. I have run scr/tgt files plain tokenized (no agresside no joiner) against the default model and the transformer model (http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model-do-you-support-multi-gpu) . But still not sure if the transformer model it is better or not as bleu score drops due an unknown replacement problem.

After running the same translate command

python ../translate.py -model MODEL.model_step_xxx.pt -src src_verify.atok -output tgt.xxx.atok.MT -replace_unk -verbose -gpu 0

we are facing an issue with the unk.

Transformer model

Aportación de la corporación local : . ptas .
Nombre : Joan . Camps .

Default model

Aportación de la corporación local : 213.566 ptas .
Nombre : Joan Bestard Camps .

(I assume 213.000 and the Bestard are UNK). Looks like default model is replacing the unk, the transformer does not. Any idea why?

Srry if this out of the topic, but besides the “try yourself”, does anyone some reference in order to explain how the models have to be tuned. The FAQ comment that states “The transformer model is very sensitive to hyperparameters” is a little bit scary. Some googled references are not very promissing as looks like only force brute and thousand of GPU hours are the best options to improve a model (i.e. Massive Exploration of Neural Machine Translation Architectures).

Maybe I am naive, but even from a not very close distance, looks strange that still are not clear rules for instance for very common languages pairs. (for instance how to detokenize or sizing parameters).

Thanks!

Have a nice day
Miguel

guillaumekln · April 17, 2019, 7:54am

Hi,

The replace unk mechanism requires a model with a single attention head so that it can be interpreted as an alignment vector. This does not apply to the Transformer model so you should not use this option.

That’s why the FAQ entry exists and comes with the recommended training parameters.

When in doubt, start with something like SentencePiece on your raw data:

tel34 · April 17, 2019, 10:41am

Hi, I have just replicated your test sentences to my Transformer model and the UNK’s are handled completely, including Joan Bestard Camps. I have used SentencePiece when training this model.

miguelknals · April 17, 2019, 2:19pm

Hi @guillaumekln and @tel34

Thank you both for your feedback and fast response. I would have a look.

Things change so fast is difficult to cope them.

Again, thanks a lot!
Have a nice day
Miguel