ConvS2S training convergence is extremely slow

byuns9334 · August 1, 2019, 9:00am

Hi, now I am trying to train seq2seq model using transformer & rnn & ConvS2S.

While transformer and rnn used to early-stop training around at step 3000, ConvS2S doesn’t stop training even after step 100000 (100k). When I look at the training log, the accuracy on validation set slowly increases though.

So my question is : is early-stopping automatic option for training ConvS2S? If not, should I manually find the early-stop point based on validation accuracies? (How can I early-stop for ConvS2S training?)

My command for training is : “CUDA_VISIBLE_DEVICES=0,1 python3 train.py -data conv_jamo_underbar/ -save_model conv_jamo_underbar/ -encoder_type cnn -decoder_type cnn -valid_steps 50 -save_checkpoint_steps 50 -world_size 2 -gpu_ranks 0 1”.

I disabled the ‘–early-stopping’ option for this, because it stops after like 50 steps with ‘–early-stopping’ option. It’s been discussed in RuntimeError: value cannot be converted to type float without overflow.

Thanks !