I trained a model with clipnorm: 5
option but the loss values seem odd (the loss value goes up and down until the max training steps.). What kind of things do you think is the cause?
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Step = 100 ; source words/s = 1509, target words/s = 2218 ; Learning rate = 1.000000 ; Loss = 1262.955200
INFO:tensorflow:Step = 200 ; source words/s = 5957, target words/s = 8712 ; Learning rate = 1.000000 ; Loss = 1025.245972
INFO:tensorflow:Step = 300 ; source words/s = 5976, target words/s = 8747 ; Learning rate = 1.000000 ; Loss = 873.087341
INFO:tensorflow:Step = 400 ; source words/s = 5925, target words/s = 8705 ; Learning rate = 1.000000 ; Loss = 1276.541748
INFO:tensorflow:Step = 500 ; source words/s = 6143, target words/s = 8960 ; Learning rate = 1.000000 ; Loss = 1069.620239
INFO:tensorflow:Step = 600 ; source words/s = 6119, target words/s = 8905 ; Learning rate = 1.000000 ; Loss = 1310.696289
INFO:tensorflow:Step = 700 ; source words/s = 5978, target words/s = 8757 ; Learning rate = 1.000000 ; Loss = 760.614685
INFO:tensorflow:Step = 800 ; source words/s = 6069, target words/s = 8886 ; Learning rate = 1.000000 ; Loss = 1363.078369
INFO:tensorflow:Step = 900 ; source words/s = 6057, target words/s = 8867 ; Learning rate = 1.000000 ; Loss = 1804.104370
INFO:tensorflow:Step = 1000 ; source words/s = 5910, target words/s = 8678 ; Learning rate = 1.000000 ; Loss = 865.372314
Parameters I used:
params:
optimizer: SGD
learning_rate: 1.0
clipnorm: 5
maximum_features_length: 50
maximum_labels_length: 50
Output from OpenNMT-tf (the model definition part).
INFO:tensorflow:Using parameters:
data:
eval_features_file: eval.src
eval_labels_file: eval.trg
source_vocabulary: src-vocab.txt
target_vocabulary: trg-vocab.txt
train_features_file: train.src
train_labels_file: train.trg
eval:
batch_size: 32
external_evaluators: BLEU
steps: 10000
infer:
batch_size: 32
length_bucket_width: null
model_dir: model_lstm
params:
average_loss_in_time: false
clipnorm: 5
learning_rate: 1.0
maximum_features_length: 50
maximum_labels_length: 50
num_hypotheses: 1
optimizer: SGD
score:
batch_size: 64
train:
batch_size: 64
batch_type: examples
keep_checkpoint_max: 10
length_bucket_width: 1
max_step: 100000
sample_buffer_size: 500000
save_checkpoints_steps: 10000
save_summary_steps: 100
OpenNMT-py gives the following output on the same data set. The OpenNMT-py version is working fine.
[2020-04-19 17:52:27,439 INFO] * src vocab size = 50002
[2020-04-19 17:52:27,444 INFO] * tgt vocab size = 6529
[2020-04-19 17:52:27,444 INFO] Building model...
[2020-04-19 17:52:34,454 INFO] NMTModel(
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(50002, 500, padding_idx=1)
)
)
)
(rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(6529, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.3, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3, inplace=False)
(layers): ModuleList(
(0): LSTMCell(1000, 500)
(1): LSTMCell(500, 500)
)
)
(attn): GlobalAttention(
(linear_in): Linear(in_features=500, out_features=500, bias=False)
(linear_out): Linear(in_features=1000, out_features=500, bias=False)
)
)
(generator): Sequential(
(0): Linear(in_features=500, out_features=6529, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2020-04-19 17:52:34,455 INFO] encoder: 29009000
[2020-04-19 17:52:34,456 INFO] decoder: 12293529
[2020-04-19 17:52:34,456 INFO] * number of parameters: 41302529
[2020-04-19 17:52:34,458 INFO] Starting training on GPU: [0]
[2020-04-19 17:52:34,458 INFO] Start training loop and validate every 10000 steps...
[2020-04-19 17:52:34,458 INFO] Loading dataset from data.train.0.pt
[2020-04-19 17:52:35,599 INFO] number of examples: 95777
[2020-04-19 17:52:38,413 INFO] Step 50/100000; acc: 6.53; ppl: 18941.74; xent: 9.85; lr: 1.00000; 5340/8017 tok/s; 4 sec
[2020-04-19 17:52:39,843 INFO] Step 100/100000; acc: 8.01; ppl: 2210.37; xent: 7.70; lr: 1.00000; 15184/22639 tok/s; 5 sec
[2020-04-19 17:52:41,138 INFO] Step 150/100000; acc: 12.49; ppl: 595.37; xent: 6.39; lr: 1.00000; 15672/23512 tok/s; 7 sec
[2020-04-19 17:52:42,485 INFO] Step 200/100000; acc: 17.17; ppl: 301.88; xent: 5.71; lr: 1.00000; 17055/23672 tok/s; 8 sec
[2020-04-19 17:52:43,858 INFO] Step 250/100000; acc: 18.95; ppl: 194.80; xent: 5.27; lr: 1.00000; 15793/23540 tok/s; 9 sec
[2020-04-19 17:52:45,281 INFO] Step 300/100000; acc: 20.89; ppl: 161.41; xent: 5.08; lr: 1.00000; 16053/23260 tok/s; 11 sec
[2020-04-19 17:52:46,658 INFO] Step 350/100000; acc: 25.23; ppl: 114.16; xent: 4.74; lr: 1.00000; 15797/23489 tok/s; 12 sec
[2020-04-19 17:52:47,988 INFO] Step 400/100000; acc: 27.51; ppl: 95.26; xent: 4.56; lr: 1.00000; 16054/23880 tok/s; 14 sec
[2020-04-19 17:52:49,409 INFO] Step 450/100000; acc: 28.73; ppl: 81.98; xent: 4.41; lr: 1.00000; 15982/24430 tok/s; 15 sec
[2020-04-19 17:52:50,879 INFO] Step 500/100000; acc: 29.15; ppl: 79.81; xent: 4.38; lr: 1.00000; 16630/24165 tok/s; 16 sec