Hi I have built a translation model on a largish set of wmt data and it was ok.
I now want to recreate an experiment I did on 42k sentences of Gale data, except the training size is too small to get good results. (MY SMT results got 19 in BLEU, NMT is getting around 13)
Is it possible to use the wmt model and train it some more using the gale data to try and get better results on the gale data?
I did try the train_from option but the final model was much worse, so I have a feeling that my method wasn’t good.
your approach to specialize the wmt model to let it handle the Gale data by using the ‘train_from’ option is the usual one.
However, there are some things that you have to be carefull with:
do you use the Gale data preprocessed with respect to the wmt dictionaries? do you know how much vocabulary overlaps between the two data sets?
Perhaps Gale data is too far from the wmt data and the specialization process do not let the translation model to get better translations.
how many epochs do you train on the Gale data?
Maybe the problem is that your model overfits soon, since Gale data is not so big.
In this case, you can lower the learning rate if you are using the sgd as optimizer for instance, or ask for saving a checkpoint every less iterations in order to be able to obtain a model before the overfitting happens (like in the middle of an epoch for instance: -save_every 325 – your 42k sentences in batches of 64 will end in ~657 iterations --).