Error installing openNMT-tf in Linux

tensorflow

(Claudia) #1

Hello
I installed OpenNMT-tf in a virtual environment in python and when I try to test the installation using onmt-main -h it raises an error. Also if I try onmt-build-vocab --size 50000 --save_vocab data/toy-ende/src-vocab.txt data/toy-ende/src-train.txt I obtain the same error:

(venv) claudia@philarion:~/opennmt_tf/OpenNMT-tf$ onmt-build-vocab --size 50000 --save_vocab data/toy-ende/src-vocab.txt data/toy-ende/src-train.txt
Traceback (most recent call last):
  File "/home/claudia/opennmt_tf/venv/bin/onmt-build-vocab", line 7, in <module>
    from opennmt.bin.build_vocab import main
  File "/home/claudia/opennmt_tf/venv/lib/python3.5/site-packages/opennmt/__init__.py", line 5, in <module>
    from opennmt import decoders
  File "/home/claudia/opennmt_tf/venv/lib/python3.5/site-packages/opennmt/decoders/__init__.py", line 3, in <module>
    from opennmt.decoders.rnn_decoder import RNNDecoder
  File "/home/claudia/opennmt_tf/venv/lib/python3.5/site-packages/opennmt/decoders/rnn_decoder.py", line 5, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/__init__.py", line 48, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/claudia/opennmt_tf/venv/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/claudia/opennmt_tf/venv/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.7.5: cannot open shared object file: No such file or directory

Any help?
Regards!


(Guillaume Klein) #2

Hello,

What TensorFlow and CUDA versions are you using? Recent releases of TensorFlow require CUDA 9.0.


(Claudia) #3

Hi! I am using tensorflow 0.9.0 and there are two cuda versions in the server I am using, but I’m not which one is the default one. (versions are 8.0 and 9.1)

Thanks!


(Guillaume Klein) #4

As stated in the README of the project, the minimum required TensorFlow version is 1.4. If you have issues managing the different CUDA versions, I suggest using a TensorFlow Docker image.


(Claudia) #5

Hello! I just updated everything and I’m running the toy example from the Quick Start and I got this error:

`Caused by op u'seq2seq/encoder/w_embs/Initializer/random_uniform/RandomUniform', defined at:

File "/usr/local/bin/onmt-main", line 11, in <module> sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/opennmt/bin/main.py", line 150, in main runner.train_and_evaluate() File "/usr/local/lib/python2.7/dist-packages/opennmt/runner.py", line 167, in train_and_evaluate tf.estimator.train_and_evaluate(self._estimator, train_spec, eval_spec) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 447, in train_and_evaluate return executor.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 531, in run return self.run_local() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 669, in run_local hooks=train_hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 366, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1119, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1132, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1107, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/opennmt/models/model.py", line 103, in _model_fn _loss_op, features_shards, labels_shards, params, mode, config) File "/usr/local/lib/python2.7/dist-packages/opennmt/utils/parallel.py", line 148, in __call__ outputs.append(funs[i](*args[i], **kwargs[i])) File "/usr/local/lib/python2.7/dist-packages/opennmt/models/model.py", line 66, in _loss_op logits, _ = self._build(features, labels, params, mode, config=config) File "/usr/local/lib/python2.7/dist-packages/opennmt/models/sequence_to_sequence.py", line 144, in _build scope=source_input_scope)(features) File "/usr/local/lib/python2.7/dist-packages/opennmt/models/sequence_to_sequence.py", line 50, in _scoped_embedding_fn return embedding_fn(ids) File "/usr/local/lib/python2.7/dist-packages/opennmt/models/sequence_to_sequence.py", line 143, in <lambda> lambda ids: self.source_inputter.transform_data(ids, mode=mode, log_dir=log_dir), File "/usr/local/lib/python2.7/dist-packages/opennmt/inputters/inputter.py", line 194, in transform_data inputs = self._transform_data(data, mode) File "/usr/local/lib/python2.7/dist-packages/opennmt/inputters/text_inputter.py", line 361, in _transform_data return self.transform(data["ids"], mode) File "/usr/local/lib/python2.7/dist-packages/opennmt/inputters/text_inputter.py", line 388, in transform trainable=self.trainable) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1328, in get_variable constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1090, in get_variable constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter use_resource=use_resource, constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 796, in _get_single_variable use_resource=use_resource) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 2234, in variable use_resource=use_resource) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 2224, in <lambda> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 2207, in default_variable_creator constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 259, in __init__ constraint=constraint) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 368, in _init_from_args initial_value(), name="initial_value", dtype=dtype) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 780, in <lambda> shape.as_list(), dtype=dtype, partition_info=partition_info) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py", line 476, in __call__ shape, -limit, limit, dtype, seed=self.seed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 242, in random_uniform rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed1, seed2=seed2) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_random_ops.py", line 674, in random_uniform name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24999,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: seq2seq/encoder/w_embs/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@seq2seq/encoder/w_embs/Assign"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](seq2seq/encoder/w_embs/Initializer/random_uniform/shape)]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Any ideas??
Thanks in advance!!


(Guillaume Klein) #6

OOM means “Out Of Memory”. How much GPU memory do you have?


(Claudia) #7

Oh thanks!! I just checked and someone else is running things in the GPU, only a few MB left :smile: