Hi, I have a very strange behavior with the bi-directional RNN network. I trained my translation model with 100k sentence pairs and after some steps, the evaluation loss is increasing.
However, when I test the model with my test dataset at checkpoint step 40k, I get the BLEU of 16.54. I did the same with the model at checkpoint step 100k and I get BLEU of 18.25.
To my knowledge, evaluation loss increases means the model is overfitting and should perform worse when the model meet new sentences. Somehow the model in this experiment can perform even better although evaluation loss is increasing.
Is this behavior normal?
This is my config file for my experiment
model_dir: model
data:
eval_features_file: cv.dev.cn
eval_labels_file: cv.dev.vn
source_words_vocabulary: src-vocab.txt
target_words_vocabulary: tgt-vocab.txt
train_features_file: cv.train.cn
train_labels_file: cv.train.vn
eval:
batch_size: 32
eval_delay: 0
exporters: last
infer:
batch_size: 32
bucket_width: 0
params:
average_loss_in_time: true
beam_width: 5
learning_rate: 0.001
clip_gradients: 5
label_smoothing: 0.1
length_penalty: 0
gradients_accum: 1
optimizer: AdamOptimizer
optimizer_params:
beta1: 0.9
beta2: 0.999
decay_type: exponential_decay
decay_params:
decay_rate: 0.5
decay_steps: 10000
staircase: true
decay_step_duration: 1
start_decay_steps: 50000
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 4096
batch_type: tokens
keep_checkpoint_max: 8
maximum_features_length: 50
maximum_labels_length: 50
sample_buffer_size: -1
save_checkpoints_steps: 5000
save_summary_steps: 100
train_steps: 200000
And my model
import tensorflow as tf
import opennmt as onmt
def model():
return onmt.models.SequenceToSequence(
source_inputter=onmt.inputters.WordEmbedder(
vocabulary_file_key="source_words_vocabulary",
embedding_size=512
),
target_inputter=onmt.inputters.WordEmbedder(
vocabulary_file_key="target_words_vocabulary",
embedding_size=512
),
encoder=onmt.encoders.BidirectionalRNNEncoder(
num_layers=2,
num_units=512,
reducer=onmt.layers.ConcatReducer(),
cell_class=tf.nn.rnn_cell.LSTMCell,
dropout=0.2,
residual_connections=False
),
decoder=onmt.decoders.AttentionalRNNDecoder(
num_layers=2,
num_units=512,
bridge=onmt.layers.CopyBridge(),
attention_mechanism_class=tf.contrib.seq2seq.LuongAttention,
cell_class=tf.contrib.rnn.LSTMCell,
dropout=0.2,
residual_connections=False))