Lots of unknowns, can't increase vocabulary size

Loek · November 29, 2017, 4:29pm

I’m trying to lower the number of unknowns in the engine’s suggestions. I tried adding -src_vocab_size 0 and or -src_vocab_size 100000, but train.lua still states a vocabulary size of 50004, ignoring the flag. How come? This is in Torch.

Also, is there a way to save models more often? As the training is occupying the whole system, I need to turn it off once in a while. Currently data is saved only 4 hours or so, which is far from ideal when you want to resume training.

guillaumekln · November 29, 2017, 4:52pm

Did you add -src_vocab_size to preprocess.lua?

Also see:

For more frequent save, see the documentation for -save_every options.

Loek · November 29, 2017, 4:53pm

Ah, thank you very much for your help!

Loek · November 29, 2017, 6:07pm

The -save_every option works, however, when training is resumed, OpenNMT starts from the very beginning of the epoch it was working on, instead of the nth step that was saved with -save_every Is this inevitable?

panosk · November 29, 2017, 7:39pm

Hi @Loek,

See here for resuming training: http://opennmt.net/OpenNMT/training/retraining/#resuming-a-stopped-training

Particularly, you need to pass the checkpoint instead of the epoch when resuming:

-save_every 50 -train_from demo_checkpoint.t7 -continue

Loek · November 29, 2017, 10:35pm

Thank you, that worked!