Is there any way to fine tune Pretrained language models on opennmt?

lockder · December 13, 2019, 9:50am

Right know the state of the art for nlp comes from fine tuning huge pre trained language models like
Bert,Roberta, xlnet, ernie 2.0?
Is there any way to fine tune the a custom model loading the pretrained models?

Bachstelze · December 14, 2019, 8:07am

Hey lockder!
What do you mean with fine-tune a language model? OpenNMT is a seq2seq model, therefore fine-tuning isn’t that stright-foreword like other nlp tasks and the Discussion about BERT dozed off. You could use the context embedding as an additional input feature or use the language model for ranking and data augmentation.

There is also the MASS pretraining for seq2seq models. We could try to fine-tune those pretrained models with openNMT-py.

lockder · December 16, 2019, 11:23am

thanks for the links
I mean right know when you can read a paper, bert (or any other contextual pre trained model , can be used also for text classification or Ner, so you have to retrain the model and fine tune for your current objective). This means load the model then trained again for your current objective

lockder · December 16, 2019, 11:27am

right know google publlished a better pretrained contextual language model

Bachstelze · December 16, 2019, 12:17pm

I think that ALBERT is only trained for English. Pytorch released last week the pretrained xml-r which was tested initialy with (unsupervised) NMT. But i don’t know if the model definition is usable in openNMT.

Bachstelze · December 18, 2019, 1:11pm

There is a general paper for pretraining seq2seq taks. In most nmt cases the use of the pretrained encoder gives good results. To initialize the decoder, we could try the copied monolingual data.

In which language pairs and domains are you interested?

lockder · December 18, 2019, 2:17pm

well right know I’m doing english and spanish but I’m focused on Text Classification and Named Entity Recognition. Maybe replicating the model we can load directly the pretrained weights ? Not sure a solution