Issue with "num_replicas" in OpenNMT-tf 1.25.2

tel34 · November 12, 2019, 10:23am

Having migrated to OpenNMT-tf V2 I need to train some models in V1. I installed the latest release (OpenNMT-tf 1.25.2) in a virtual environment to avoid conflicts and set the Python version for the environment to 2.7. I am having problems which I did not encounter in earlier versions of OpenNMT-tf in which I trained numerous models.
My command is:
(tf1_env) miguel@joshua:~$ onmt-main train_and_eval --num_gpus 1 --model_type Transformer --config /home/miguel/tf_experiments/span2eng/data.yml --auto_config
The most recent error is stated below:
File “/home/miguel/tf1_env/local/lib/python2.7/site-packages/opennmt/runner.py”, line 220, in _finalize_training_parameters
num_replicas=self._num_replicas)
TypeError: ‘NoneType’ object does not support item assignment

In runner.py I read on line 84 “self_num_replicas = hvd.size() if hvd is not None else num_devices”
However, the command line only takes “–num_gpus” and not “–num_devices”. I have tried entering num_replicas as a parameter but that did not work. Any suggestions would be appreciated.

my config file is:
data:
train_features_file: /home/miguel/tf_experiments/span2eng/span_corp_sp.en_train.txt
train_labels_file: /home/miguel/tf_experiments/span2eng/span_corp_sp.es_train.txt

eval_features_file: /home/miguel/tf_experiments/span2eng/span_corp_sp.en_validation.txt
eval_labels_file: /home/miguel/tf_experiments/span2eng/span_corp_sp.es_validation.txt

source_words_vocabulary: /home/miguel/tf_experiments/span2eng/src-vocab.txt
target_words_vocabulary: /home/miguel/tf_experiments/span2eng/tgt-vocab.txt

params:
optimizer: Adam
optimizer_params:
beta1: 0.8
beta2: 0.998
learning_rate: 1.0
beam_width: 5
length_penalty: 0.2
minimum_decoding_length: 0
maximum_iterations: 200
replace_unknown_target: true

train:
batch_size: 1024
effective_bath_size: 25000
batch_type: tokens
save_checkpoint_steps: 5000
train_steps: 500000
maximum_features_length: 70
maximum_labels_length: 70

eval:
batch_size: 30
save_eval_predictions: true
external_evaluators: BLEU
exporters: best

guillaumekln · November 12, 2019, 10:27am

Is this your actual configuration file? It misses some indentation.

tel34 · November 12, 2019, 10:46am

No, I brought it all to the left after pasting in from the screen, a bad habit of mine :-). I can restore the indentation if that makes it all more readable.

tel34 · November 12, 2019, 1:40pm

That whole configuration file got messed up by cutting and pasting. I have now got this training further along the road. I apologize to everyone for wasting your time.

guillaumekln · November 12, 2019, 1:41pm

No worries, thanks for the update.