I reduced nmtmedium emb size to 128, and tokens are individual characters (so less then 40 of them), but it still doesn’t learn to just output vanilla input even with few hours of training. I thought it would be fairly easy for it to learn this (input is up to 5 tokens long), but it’s not, so my understanding is incorrect. What am I missing?
What is your training configuration?
readymade NMTMedium with 128-embedding
Maybe there is not enough data to train a
NMTMedium model. Or the learning rate is decaying too fast.