From OpenNMT-py To OpenNMT-tf: OOM and other little questions

yaren · July 26, 2019, 3:53pm

Hi, Since I found the OpenNMT-py did not support token-alignment . Now I start using OpenNMT-tf . But I have some little problem:
1.why there are so many warning like:

WARNING: Logging before flag parsing goes to stderr.
W0726 23:13:02.695762 140077770876736 deprecation_wrapper.py:119] From /home/anaconda3/envs/tf/lib/python3.6/site-packages/opennmt/decoders/rnn_decoder.py:435: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.

W0726 23:13:03.188360 140077770876736 deprecation_wrapper.py:119] From /home/anaconda3/envs/tf/lib/python3.6/site-packages/opennmt/optimizers/adafactor.py:32: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0726 23:13:03.188865 140077770876736 deprecation_wrapper.py:119] From /home/anaconda3/envs/tf/lib/python3.6/site-packages/opennmt/optimizers/multistep_adam.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

is it all right just Ignore these warning?

2 why the process using so mainy RAM and CPU:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22513 root 20 0 73.5g 43.9g 2.4g S 235.5 70.0 33:57.50 /home/anaconda3/envs/tf/bin/python /home/anaconda3/envs/tf/bin/onmt-main train_and_eval --model_type Transformer --config con.yml --num_gpu+

when I use OpenNMT-py, it did not use so much RAM, and OpenNMT-tf used all the CPU cores.

3.I have two RTX2070(8G), but with auto_config, It will OOM with batch_size: 3072 (auto) ; should I use auto_config ?
…
4. when I use OpenNMT-py, the speed is more than 12000 toks/s and GPU-Util allway about95%; but Opennmt-tf is :

source_words/sec: 11642
target_words/sec: 11199
and the GPU-Util is less than 85%
and global_step/sec: 0.430799
Is it all right?

Anyway, I think Opennmt-py is perfect.

for token alignment, I even used marian nmt. But it’s more slowly, not exactly like they be as known:

[2019-07-24 18:51:25] Ep. 8 : Up. 109350 : Sen. 1,909,744 : Cost 3.70446014 : Time 14.23s : 9882.22 words/s : L.r. 1.1476e-04
[2019-07-24 18:51:40] Ep. 8 : Up. 109400 : Sen. 1,916,592 : Cost 3.75405455 : Time 14.52s : 10208.97 words/s : L.r. 1.1473e-04

Am I right to use Opennmt-tf. Or I need help to use token alignment with ONMT system. thanks guys.

guillaumekln · July 26, 2019, 4:10pm

You can ignore all deprecation warnings, they will be addressed in a future version.

By default, the complete dataset is loaded in the memory. You can define a buffer size in the configuration:

train:
  sample_buffer_size: 1000000

For the CPU, TensorFlow can spawn many threads but usually they are not very active compared to the GPU usage.

Do you have any additional configuration? The Transformer auto_config works on 6GB.

The Transformer model should be well optimized with GPU usage close to 100%. Can you post con.yml?

yaren · July 26, 2019, 4:19pm

I start with this first:

onmt-main train_and_eval --model_type Transformer --auto_config --config con.yml --num_gpus 2

but now I did not use aut_config, and manually modified conf.yml:

data:
  eval_features_file: v.en
  eval_labels_file: v.zh
  source_words_vocabulary: en.vocab
  target_words_vocabulary: zh.vocab
  train_alignments: all.corpus.align
  train_features_file: all.en.punk.tok.case.bpe
  train_labels_file: all.zh.punk.tok.bpe
eval:
  batch_size: 16
  eval_delay: 7200
  exporters: last
  external_evaluators: BLEU
infer:
  batch_size: 32
  bucket_width: 5
model_dir: run1
params:
  average_loss_in_time: true
  beam_width: 4
  decay_params:
    model_dim: 512
    warmup_steps: 8000
  decay_type: noam_decay_v2
  gradients_accum: 1
  guided_alignment_type: ce
  guided_alignment_weight: 1
  label_smoothing: 0.1
  learning_rate: 2.0
  optimizer: LazyAdamOptimizer
  optimizer_params:
    beta1: 0.9
    beta2: 0.998
score:
  batch_size: 64
train:
  average_last_checkpoints: 8
  batch_size: 2000
  batch_type: tokens
  bucket_width: 1
  effective_batch_size: 25000
  keep_checkpoint_max: 50
  maximum_features_length: 100
  maximum_labels_length: 100
  num_threads: 8
  sample_buffer_size: -1
  save_checkpoints_steps: 7200
  save_summary_steps: 100
  train_steps: 500000