Transformer with really extrasmall dataset

SamuelLacombe · July 9, 2021, 3:21pm

Hello,

I’ve been playing around with both opennmt-py and opennmt-tf. I have been having really good results with SGD in opennmt-py with extremely small dataset… below 200k. But with transformer in opennmt-tf the results are extremely bad.

Is it expected from transformer to behave pretty bad on externally small dataset? Or is there any parameter i could fine tune to help. I’m currently using the default one.

I have an average of -6 predict score with SGD and an average -20 predict score with transformer

Thank you!

guillaumekln · July 9, 2021, 3:33pm

SGD is an optimizer. Do you mean you have good results with a small LSTM compared to a Transformer model?

If yes, then the result is expected. The Transformer with its default configuration requires large datasets.

SamuelLacombe · July 9, 2021, 5:50pm

Sorry, yes your right. I meant LSTM. I will probably have to change the optimizer for small dataset…

If anyone has suggestion for a starting point parameter for small dataset let me know. It will surely help out👌

SamuelLacombe · July 10, 2021, 5:04pm

If anyone face the same issue… Here some documentation I found about it.

https://datascience.stackexchange.com/questions/80483/based-on-transformer-how-to-improve-the-text-generation-results

ymoslem · July 10, 2021, 9:14pm

Dear Samuel,

First, by definition, a 100-200k dataset is not “extremely” small. It is just small, and there are approaches to make it work. You might want to check the following resources:

Kind regards,
Yasmin

SamuelLacombe · July 11, 2021, 11:55pm

Thank you, Yasmin.

I will have a look into that, but the reason I said “extremely small” is that I have some languages which have absolutely nothing on the internet and I have only about 15k to 50k sentences (before data augmentation). So using an existing model is not an option. Yet, I was able to generate decent model that would give me fair to good translation 30% of the time with LSTM.

I might not be able to accomplish that level of accuracy with Transformer for these languages. I will go through the documentation you posted and see what I can come up with!

SamuelLacombe · July 19, 2021, 2:34am

If anyone happens to come upon this post… here is some additional documentation I found that helped me out:

https://aclanthology.org/2020.coling-main.304.pdf

SamuelLacombe · July 20, 2021, 12:05pm

I noticed that there is a TransformerTiny in the catalogue of transformer… is there something missing to use it?

guillaumekln · July 20, 2021, 12:23pm

This model is not included in a released version yet.