I’ve been playing around with both opennmt-py and opennmt-tf. I have been having really good results with SGD in opennmt-py with extremely small dataset… below 200k. But with transformer in opennmt-tf the results are extremely bad.
Is it expected from transformer to behave pretty bad on externally small dataset? Or is there any parameter i could fine tune to help. I’m currently using the default one.
I have an average of -6 predict score with SGD and an average -20 predict score with transformer
First, by definition, a 100-200k dataset is not “extremely” small. It is just small, and there are approaches to make it work. You might want to check the following resources:
I will have a look into that, but the reason I said “extremely small” is that I have some languages which have absolutely nothing on the internet and I have only about 15k to 50k sentences (before data augmentation). So using an existing model is not an option. Yet, I was able to generate decent model that would give me fair to good translation 30% of the time with LSTM.
I might not be able to accomplish that level of accuracy with Transformer for these languages. I will go through the documentation you posted and see what I can come up with!