OpenNMT-tf configuration for the default model described in OpenNMT-py's Quickstart page

I am trying to define the default model described in OpenmNMT-py’s Quickstart page using OpenNMT-tf. Is there a pre-existing config file for this purpose?

I have the following model definition. Does it seem the same as the one described in OpenmNMT-py’s page?

Python file that defines the model:

import tensorflow as tf
import tensorflow_addons as tfa
import opennmt as onmt

from opennmt import decoders
from opennmt import encoders
from opennmt import inputters
from opennmt import layers
from opennmt.utils import misc

def model():
  return onmt.models.SequenceToSequence(
  source_inputter=inputters.WordEmbedder(
        embedding_size=500),
    target_inputter=inputters.WordEmbedder(
        embedding_size=500),
    encoder=encoders.RNNEncoder(
        num_layers=2,
        num_units=500,
        dropout=0.3,
        residual_connections=False,
        cell_class=tf.keras.layers.LSTMCell),
    decoder=decoders.AttentionalRNNDecoder(
        num_layers=2,
        num_units=500,
        bridge_class=layers.CopyBridge,
        attention_mechanism_class=tfa.seq2seq.LuongAttention,
        cell_class=tf.keras.layers.LSTMCell,
        dropout=0.3,
        residual_connections=False))

Config file:

model_dir: model_lstm

data:
  train_features_file: train.src
  train_labels_file: train.trg
  eval_features_file: eval.src
  eval_labels_file: eval.trg
  source_vocabulary: src-vocab.txt
  target_vocabulary: trg-vocab.txt

params:
  optimizer: SGD
  learning_rate: 1.0
  beam_width: 5

train:
  save_checkpoints_steps: 1000
  keep_checkpoint_max: 10
  max_step: 10000
  batch_size: 64

eval:
  external_evaluators: BLEU
  steps: 1000

infer:
  batch_size: 32

Hello,

This looks about right. You may need to clip the gradients when using this learning rate with SGD:

params:
  optimizer_params:
    clipnorm: 5
1 Like

Thank you!

I trained a model with clipnorm: 5 option but the loss values seem odd (the loss value goes up and down until the max training steps.). What kind of things do you think is the cause?

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Step = 100 ; source words/s = 1509, target words/s = 2218 ; Learning rate = 1.000000 ; Loss = 1262.955200
INFO:tensorflow:Step = 200 ; source words/s = 5957, target words/s = 8712 ; Learning rate = 1.000000 ; Loss = 1025.245972
INFO:tensorflow:Step = 300 ; source words/s = 5976, target words/s = 8747 ; Learning rate = 1.000000 ; Loss = 873.087341
INFO:tensorflow:Step = 400 ; source words/s = 5925, target words/s = 8705 ; Learning rate = 1.000000 ; Loss = 1276.541748
INFO:tensorflow:Step = 500 ; source words/s = 6143, target words/s = 8960 ; Learning rate = 1.000000 ; Loss = 1069.620239
INFO:tensorflow:Step = 600 ; source words/s = 6119, target words/s = 8905 ; Learning rate = 1.000000 ; Loss = 1310.696289
INFO:tensorflow:Step = 700 ; source words/s = 5978, target words/s = 8757 ; Learning rate = 1.000000 ; Loss = 760.614685
INFO:tensorflow:Step = 800 ; source words/s = 6069, target words/s = 8886 ; Learning rate = 1.000000 ; Loss = 1363.078369
INFO:tensorflow:Step = 900 ; source words/s = 6057, target words/s = 8867 ; Learning rate = 1.000000 ; Loss = 1804.104370
INFO:tensorflow:Step = 1000 ; source words/s = 5910, target words/s = 8678 ; Learning rate = 1.000000 ; Loss = 865.372314

Parameters I used:

params:
  optimizer: SGD
  learning_rate: 1.0
  clipnorm: 5
  maximum_features_length: 50
  maximum_labels_length: 50

Output from OpenNMT-tf (the model definition part).

INFO:tensorflow:Using parameters:
data:
  eval_features_file: eval.src
  eval_labels_file: eval.trg
  source_vocabulary: src-vocab.txt
  target_vocabulary: trg-vocab.txt
  train_features_file: train.src
  train_labels_file: train.trg
eval:
  batch_size: 32
  external_evaluators: BLEU
  steps: 10000
infer:
  batch_size: 32
  length_bucket_width: null
model_dir: model_lstm
params:
  average_loss_in_time: false
  clipnorm: 5
  learning_rate: 1.0
  maximum_features_length: 50
  maximum_labels_length: 50
  num_hypotheses: 1
  optimizer: SGD
score:
  batch_size: 64
train:
  batch_size: 64
  batch_type: examples
  keep_checkpoint_max: 10
  length_bucket_width: 1
  max_step: 100000
  sample_buffer_size: 500000
  save_checkpoints_steps: 10000
  save_summary_steps: 100

OpenNMT-py gives the following output on the same data set. The OpenNMT-py version is working fine.

[2020-04-19 17:52:27,439 INFO]  * src vocab size = 50002
[2020-04-19 17:52:27,444 INFO]  * tgt vocab size = 6529
[2020-04-19 17:52:27,444 INFO] Building model...
[2020-04-19 17:52:34,454 INFO] NMTModel(
  (encoder): RNNEncoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(50002, 500, padding_idx=1)
        )
      )
    )
    (rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
  )
  (decoder): InputFeedRNNDecoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(6529, 500, padding_idx=1)
        )
      )
    )
    (dropout): Dropout(p=0.3, inplace=False)
    (rnn): StackedLSTM(
      (dropout): Dropout(p=0.3, inplace=False)
      (layers): ModuleList(
        (0): LSTMCell(1000, 500)
        (1): LSTMCell(500, 500)
      )
    )
    (attn): GlobalAttention(
      (linear_in): Linear(in_features=500, out_features=500, bias=False)
      (linear_out): Linear(in_features=1000, out_features=500, bias=False)
    )
  )
  (generator): Sequential(
    (0): Linear(in_features=500, out_features=6529, bias=True)
    (1): Cast()
    (2): LogSoftmax()
  )
)
[2020-04-19 17:52:34,455 INFO] encoder: 29009000
[2020-04-19 17:52:34,456 INFO] decoder: 12293529
[2020-04-19 17:52:34,456 INFO] * number of parameters: 41302529
[2020-04-19 17:52:34,458 INFO] Starting training on GPU: [0]
[2020-04-19 17:52:34,458 INFO] Start training loop and validate every 10000 steps...
[2020-04-19 17:52:34,458 INFO] Loading dataset from data.train.0.pt
[2020-04-19 17:52:35,599 INFO] number of examples: 95777
[2020-04-19 17:52:38,413 INFO] Step 50/100000; acc:   6.53; ppl: 18941.74; xent: 9.85; lr: 1.00000; 5340/8017 tok/s;      4 sec
[2020-04-19 17:52:39,843 INFO] Step 100/100000; acc:   8.01; ppl: 2210.37; xent: 7.70; lr: 1.00000; 15184/22639 tok/s;      5 sec
[2020-04-19 17:52:41,138 INFO] Step 150/100000; acc:  12.49; ppl: 595.37; xent: 6.39; lr: 1.00000; 15672/23512 tok/s;      7 sec
[2020-04-19 17:52:42,485 INFO] Step 200/100000; acc:  17.17; ppl: 301.88; xent: 5.71; lr: 1.00000; 17055/23672 tok/s;      8 sec
[2020-04-19 17:52:43,858 INFO] Step 250/100000; acc:  18.95; ppl: 194.80; xent: 5.27; lr: 1.00000; 15793/23540 tok/s;      9 sec
[2020-04-19 17:52:45,281 INFO] Step 300/100000; acc:  20.89; ppl: 161.41; xent: 5.08; lr: 1.00000; 16053/23260 tok/s;     11 sec
[2020-04-19 17:52:46,658 INFO] Step 350/100000; acc:  25.23; ppl: 114.16; xent: 4.74; lr: 1.00000; 15797/23489 tok/s;     12 sec
[2020-04-19 17:52:47,988 INFO] Step 400/100000; acc:  27.51; ppl: 95.26; xent: 4.56; lr: 1.00000; 16054/23880 tok/s;     14 sec
[2020-04-19 17:52:49,409 INFO] Step 450/100000; acc:  28.73; ppl: 81.98; xent: 4.41; lr: 1.00000; 15982/24430 tok/s;     15 sec
[2020-04-19 17:52:50,879 INFO] Step 500/100000; acc:  29.15; ppl: 79.81; xent: 4.38; lr: 1.00000; 16630/24165 tok/s;     16 sec

In:

The optimizer_params block around clipnorm is important. Can you try with it?

Additional comments:

  • You might also need to use a smaller clipnorm value as OpenNMT-py computes the global norm of the gradients while the TensorFlow optimizer computes the norm of each gradient independently.
  • The maximum_*_length parameters should be in the train block.
1 Like

I forgot the optimizer_params part. Thank you for pointing it out. I will try with the updated config.

params:
  optimizer_params:
    clipnorm: 1
  optimizer: SGD
  learning_rate: 1.0

train:
  maximum_features_length: 50
  maximum_labels_length: 50
  save_checkpoints_steps: 10000
  keep_checkpoint_max: 10
  max_step: 100000
  batch_size: 64

eval:
  external_evaluators: BLEU
  steps: 10000

infer:
  batch_size: 32