Some strange translation errors, is it a bug?

I found that NMT had some strange translation errors sometimes, although the frequency is very low, but why?

example1:
src:
is angular seed the de facto empty project to start with ?
result:
& Is angular& Is angular& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is& Is Is

example2:
src:
what does save file mean in npm install grunt save dev
result:
what save install & what what 在 install 安装 & what save save & what dev 中 的 含义

I had this happen recently as well translating with v0.5 using a model trained with v0.4

I think this is a typical issue for NMT systems.
I also found this behaviour using other toolkits, not only working with OpenNMT.

If I am not wrong, this word repetition has more to do with the model itself and its training parameters than with the toolkit you used to build it.

Using OpenNMT, training with the bi-encoder option for around 20-25 epochs controlled this phenomena for my experiments.

Regarding to the versions compatibility, I can tell you that until now, I had no problem using models from other versions of OpenNMT (from v0.3 to v0.5) with the current one.

good luck! :slight_smile:

2 Likes

20-25 epochs is very long training … may I ask what can of learning rate / decay strategy you use for such a training ?

My systems follow mostly the default configurations.
I used the default setting for learning rate: it starts with value 1 and it starts decaying at epoch 9 using sgd as optimization method.
However, you are right, best perplexities are achieved between epochs 10 and 15, so 25 epochs sound like a long training. I was just starting to play with the toolkit when I built that models :wink:
For the record, I was using training sets of ~500.000 lines, so training didn’t take too long for me (~20 hours using 2 gpus for a 800 hidden layer bi-encoder model)

Yes, I would expect this out of the first 1-4 epochs based on my previous experience. Output like this from a large enough data set trained over 10+ epochs is worth mentioning, I think.