Problem with incremental/in-domain training

dbl · March 24, 2017, 10:26am

The difference, (and @guillaumekln or @srush please correct me if I’m wrong) is that if you use -train_from alone, you are starting a “new” training from the weights, embeddings, and vocab of the model (effectively resetting start_decay_at and learning_rate_decay). If you also use -continue, you are continuing that training, even if you’re doing it with new data (and picking up the decay stuff from your first run).

emartinezVic · March 24, 2017, 11:56am

thank you both @cservan and @dbl for your explanation!

it is clearer for me now how the train_from works

cservan · March 24, 2017, 4:57pm

You’re welcome @emartinezVic

vince62s · July 20, 2017, 12:36pm

@cservan
in this paper: https://arxiv.org/pdf/1612.06141.pdf
what was the learning rate used for the additional epochs (for in-domain data) ?
thanks

cservan · July 20, 2017, 1:04pm

Hello Vincent,
the learning rate is fixed to 1 then, a decay of 0.7 is applied.

vince62s · August 30, 2017, 12:20pm

FYI: http://www.statmt.org/wmt17/pdf/WMT13.pdf

tel34 · March 26, 2018, 7:06pm

I thought I would report -in case it’s useful to others - that I’ve been specialising some models over the last few days. I have been launching using -train_from from Epoch 13 in my general model, using -update_vocab. It has generally taken me a further 13 epochs to get my model to change its beliefs and translate with the “new terminology” contained in my additional training material. An example would be “aanbestedingsdocument” which was previously translated (not wholly incorrectly) as “tender document” and with the “retrained” model is now translated with the preferred translation of “procurement document”. The successful retraining session which took 52000 “new” sentences lasted just under 2 hours on a machine with a GTX 1080Ti GPU.

vince62s · March 27, 2018, 12:19pm

Hi Terence,
this sound great. Out of curiosity what learning rate did you use during the additional 13 epochs ?
Cheers.

tel34 · March 29, 2018, 9:58am

Hi Vincent,
Been away from my machines! Just got back and looked at the training log and see.
epoch 14 LR = 0.08751 decaying to
epoch 26 LR = 0.00050
Retraining had started from Epoch 1 with (LR = 1.0) only submitting the new material
and decay of 0.65 had started at Epoch 4. I used -train_from without -continue but was surprised to see training start from Epoch 1. However, I am pleased with the result as I now have an effective way of getting post-edited translations in memoQ into the model in a reasonably short time.
Cheers,
Terence

annasamt · August 17, 2018, 10:37am

Hello Terence, and thank you for your interesting reporting on specialization issue.
I would like to ask:
(1) what option did you use for the -update_vocab i.e. merge or replace? ,
(2) are there any single-word sentences in your “new” sentences?
(3) after the retraining with the new data, did you still use a phrase_table (that you mentioned in a previous post) when you tested translations?
Thanks in advance!

tel34 · August 17, 2018, 4:35pm

Hello Anna,

I used the “merge” option but I suppose that if you want to train a highly specialised model “replace” would be the logical option. If you are thinking of specialising with a few thousand sentences this should be a very quick process so you could see what gives you your desired outcome.
There have been no single-word sentences - I have tended to specialise with “project tmx” files from real jobs and these have generally been with technical documents.
I am still using the back-off phrase_table. That is always consulted for Unknown Tokens after the inference process so if the specialisation process has worked well it will not be consulted. Logically I could omit it

With all these issues it’s a question of seeing what works best for you.

Terence

annasamt · August 23, 2018, 10:43am

Hi Terrence,

Thanks for your reply.
Another thing I would like to clarify is this: For re-training an existing engine with new data, and if this is small data compared to the existing engine (i.e. 1-2K compared to 400K), we first run the preprocess.lua command which produces new (small) *dict files and the new data model. From what I have read, while retraining we have to use the same *dict files that the existing model was trained with and use the -update_vocab option.
So what action do we have to take, practically? Because in the retrain command the *dict files are not mentioned, but in the log file I notice that actually the small ones are considered. I hope I did not confuse you…
Anna

tel34 · August 23, 2018, 11:37am

Hi Anna,
Here is the precise command I used to successfully “retrain” an existing model. The “ned2eng_further-train.t7” package contains the new vocab. The existing vocab is contained in the model and there is no other reference to it. This works for me.
#!/bin/ksh
th train.lua -config ~/generic.txt -data /home/miguel/OpenNMT/data/dutch_data/ned2eng_further-train.t7
-train_from /home/miguel/OpenNMT/models/v7_models/model_epoch13_4.51.t7
-update_vocab merge
-log_file ~/training.log
-save_model /home/miguel/OpenNMT/models/v7_models/retrained_model_construction
-gpuid 1

echo “Done!”
exit 0

annasamt · August 23, 2018, 12:57pm

Hi Terence,
Thanks so much for your prompt reply!
OK, the train command is clear - this is how I also use it (although without the -config parameter this time as I use the default settings).
Could you please also share the preprocess.lua command? From your previous comments I guess that you don’t use the --src_vocab & -tgt_vocab parameters to force the use of the vocabs used in the training of the existing model, right?
Thanks,
Anna

annasamt · August 23, 2018, 1:08pm

Hi Terence,
just wanted to add that in my experiments, having preprocessed with the new (small) data & vocab (i.e. the *.t7 contained the new vocab) and trained with a similar command to yours, when I translated a test file for I got the same repeated (irrelevant) translation some segments (approx 35 out of 1000), which was weird and does not happen to the existing model.
Anna

tel34 · August 23, 2018, 1:55pm

Anna, Here is the precise preprocess.lua command I used: The “missing*” files contain source & a target text corrected by a translator with the correct terminology. I ran the retraining for a further 13 epochs. My aim with this is to train specialised models which I would then only use for well-defined domains:
#!/bin/ksh
th preprocess.lua -train_src data/dutch_data/missing_source.txt.tok
-train_tgt data/dutch_data/missing_target.txt.tok
-save_data data/dutch_data/ned2eng_further
echo “Done!”
exit 0

annasamt · August 23, 2018, 2:30pm

Many thanks, Terence.

annasamt · August 23, 2018, 10:32am

Hi Terence,

I know it has been a while since you posted this message but I’d like to ask you if you now know what the problem was. The thing is that I get exactly the same error… Or if you could suggest a workaround

Many thanks

tel34 · August 27, 2018, 5:00pm

Hi Anna, I can’t remember what my original problem was. The commands I gave here were exactly those I used to retrain my baseline model. My two GPU machines are busy for the next few days but when they finish I will do a new retraining and make line by line notes of what I am doing. The commands I reproduced here worked for me with a training of a further 13 epochs using only -train_from and -update_vocab merge options. I know it sounds a silly question, but are you invoking your “new” model when running translate.lua?

Terence

annasamt · September 4, 2018, 11:55am

Hi Terence,
sorry for my late reply but I was away most of the time.
I would very much appreciate your sharing with me the retrain workflow.

So far, I have not been successful with the results of the retrained engine: although it does translate words that were unknown before (which means that it has learned from the vocabulary of the new data) I noticed that unknown single-word segments get the very same (irrelevant) translation while other segments get totally messed up. BLEU score of the retrained engine drops compared to the baseline engine or a new engine trained with the baseline data plus the new data (i.e. the same set used for the re-train).

One reason I can think of for this issue is that the volume of new data used for the re-training is very small compared with the baseline data. I have made an attempt to re-train another engine by mixing this new data with part of the baseline data but did not work well (i.e. BLEU was even lower).

Regarding your question, yes I do invoke the new model when running translate.lua, for example.:
-model ~/data-model_epoch13_0.00.t7 (this is the 13th epoch ran on the existing data-model_epoch13_1.58.t7 but with no validation file). Btw how else can I run translate on the re-trained engine?

Cheers,
Anna