Transformer with really extrasmall dataset


I’ve been playing around with both opennmt-py and opennmt-tf. I have been having really good results with SGD in opennmt-py with extremely small dataset… below 200k. But with transformer in opennmt-tf the results are extremely bad.

Is it expected from transformer to behave pretty bad on externally small dataset? Or is there any parameter i could fine tune to help. I’m currently using the default one.

I have an average of -6 predict score with SGD and an average -20 predict score with transformer

Thank you!

SGD is an optimizer. Do you mean you have good results with a small LSTM compared to a Transformer model?

If yes, then the result is expected. The Transformer with its default configuration requires large datasets.

Sorry, yes your right. I meant LSTM. I will probably have to change the optimizer for small dataset…

If anyone has suggestion for a starting point parameter for small dataset let me know. It will surely help out👌

If anyone face the same issue… Here some documentation I found about it.

Dear Samuel,

First, by definition, a 100-200k dataset is not “extremely” small. It is just small, and there are approaches to make it work. You might want to check the following resources:

Kind regards,

1 Like

Thank you, Yasmin.

I will have a look into that, but the reason I said “extremely small” is that I have some languages which have absolutely nothing on the internet and I have only about 15k to 50k sentences (before data augmentation). So using an existing model is not an option. Yet, I was able to generate decent model that would give me fair to good translation 30% of the time with LSTM.

I might not be able to accomplish that level of accuracy with Transformer for these languages. I will go through the documentation you posted and see what I can come up with!

If anyone happens to come upon this post… here is some additional documentation I found that helped me out:

1 Like

I noticed that there is a TransformerTiny in the catalogue of transformer… is there something missing to use it?


This model is not included in a released version yet.