Hi… I was wondering if any of the researchers here had the experience of training the opennmt-tf Transformer models on very small parallel corpora (~30k sentence pairs). I have only been working with seq2seq models till now and been getting a moderate performance on the same dataset till now.
I tried training an opennmt Transformer model using the default settings and an Adam optimizer. I realized the model doesn’t converge and the loss stays more or less the same even after 5000 training steps. The output clearly showed that the model hasn’t trained at all in my case.
I request the developers and researchers for any tips or tweaks on how to properly use the model for low resource cases. I would really appreciate the help. Thanks