I’m very interested in some OpenNMT features that tensorflow version has and I try to experiment with this version. I’m previously using OpenNMT-py and I tried to setup model in OpenNMT-tf as close to python version as possible.
I train my Chinese - Vietnamese model with 32k sentence pairs on train set, about 2k sentence pairs on dev and test set.
Sadly, when I’m using the default Rnnsmall autoconfig setting, the model get overfitted really quickly. I tried changing the params and use custom model with no success.
Whatever I tried, the model seems slowly increasing evaluation loss, sometimes immediately on the next evaluation after the first evaluation.
This is the best run I can get so far. I use BiRNN model with BLEU score of 30.87 on step 65k
After step 15k, eval loss slowly increase so I stopped training eventually.
This is the config I use:
model_dir: model data: eval_features_file: cv.dev.cn eval_labels_file: cv.dev.vn source_words_vocabulary: src-vocab.txt target_words_vocabulary: tgt-vocab.txt train_features_file: cv.train.cn train_labels_file: cv.train.vn eval: batch_size: 32 eval_delay: 0 exporters: last infer: batch_size: 32 bucket_width: 0 params: average_loss_in_time: true beam_width: 5 learning_rate: 2.0 # The scale constant. clip_gradients: null decay_step_duration: 8 # 1 decay step is 8 training steps. decay_type: noam_decay_v2 decay_params: model_dim: 512 warmup_steps: 2000 # (= 16000 training steps). start_decay_steps: 0 label_smoothing: 0.1 length_penalty: 0 gradients_accum: 1 optimizer: AdamOptimizer optimizer_params: beta1: 0.9 beta2: 0.998 score: batch_size: 64 train: average_last_checkpoints: 8 batch_size: 4096 batch_type: tokens keep_checkpoint_max: 8 maximum_features_length: 50 maximum_labels_length: 50 sample_buffer_size: -1 save_checkpoints_steps: 5000 save_summary_steps: 100 train_steps: 200000
This is my custom model
def model(): return onmt.models.SequenceToSequence( source_inputter=onmt.inputters.WordEmbedder( vocabulary_file_key="source_words_vocabulary", embedding_size=512, dtype=tf.float16 ), target_inputter=onmt.inputters.WordEmbedder( vocabulary_file_key="target_words_vocabulary", embedding_size=512, dtype=tf.float16 ), encoder=onmt.encoders.BidirectionalRNNEncoder( num_layers=2, num_units=500, reducer=onmt.layers.ConcatReducer(), cell_class=tf.nn.rnn_cell.LSTMCell, dropout=0.2, residual_connections=False ), decoder=onmt.decoders.AttentionalRNNDecoder( num_layers=2, num_units=500, bridge=onmt.layers.CopyBridge(), attention_mechanism_class=tf.contrib.seq2seq.LuongAttention, cell_class=tf.contrib.rnn.LSTMCell, dropout=0.2, residual_connections=False))
Compared to OpenNMT-py, I got BLEU score of 37.83 with the following config
!CUDA_VISIBLE_DEVICES=0 python train.py -data data/test.atok.low -save_model demo_model -gpu_ranks 0 -optim adam -learning_rate 0.001 -encoder_type brnn \ -dropout 0.2 -word_vec_size 512 -train_steps 200000\ -batch_type tokens -normalization tokens \ -label_smoothing 0.1
I hope I can get some help, why the model on tensorflow get overfitted so easily and the translation quality is worse than the py version despite I tried setting them up as close to each other as possible?