In several places of the forum and internet, the “transformer” model has been praised as a better model than the default model. I have run scr/tgt files plain tokenized (no agresside no joiner) against the default model and the transformer model (http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model-do-you-support-multi-gpu) . But still not sure if the transformer model it is better or not as bleu score drops due an unknown replacement problem.
After running the same translate command
python ../translate.py -model MODEL.model_step_xxx.pt -src src_verify.atok -output tgt.xxx.atok.MT -replace_unk -verbose -gpu 0
we are facing an issue with the unk.
Aportación de la corporación local : . ptas . Nombre : Joan . Camps .
Aportación de la corporación local : 213.566 ptas . Nombre : Joan Bestard Camps .
(I assume 213.000 and the Bestard are UNK). Looks like default model is replacing the unk, the transformer does not. Any idea why?
Srry if this out of the topic, but besides the “try yourself”, does anyone some reference in order to explain how the models have to be tuned. The FAQ comment that states “The transformer model is very sensitive to hyperparameters” is a little bit scary. Some googled references are not very promissing as looks like only force brute and thousand of GPU hours are the best options to improve a model (i.e. Massive Exploration of Neural Machine Translation Architectures).
Maybe I am naive, but even from a not very close distance, looks strange that still are not clear rules for instance for very common languages pairs. (for instance how to detokenize or sizing parameters).
Have a nice day