I’ve been playing around with both opennmt-py and opennmt-tf. I have been having really good results with SGD in opennmt-py with extremely small dataset… below 200k. But with transformer in opennmt-tf the results are extremely bad.
Is it expected from transformer to behave pretty bad on externally small dataset? Or is there any parameter i could fine tune to help. I’m currently using the default one.
I have an average of -6 predict score with SGD and an average -20 predict score with transformer