Onmt with multiple features

ishaansharma · December 13, 2019, 6:00am

Hello Fellow Researchers,
I am trying to run onmt with multiple features.
my config yml file is as follows .

model_dir: folder_name/run/
data:
  train_features_file:
- folder_name/src__10000
- folder_name/fe1_10000
- folder_name/fe2_10000
  train_labels_file: folder_name/tgt__10000
  source_1_vocabulary: folder_name/src__vocab_10000
  source_2_vocabulary: folder_name/fe1_vocab_10000
  source_3_vocabulary: folder_name/fe2_vocab_10000
  target_vocabulary: folder_name/tgt__vocab_10000

train:
  save_checkpoints_steps: 100

  eval:
eval_delay: 3600  # Every 1 hour
external_evaluators: BLEU
infer:
batch_size: 32

I am getting output like this with errors . can anyone help me where I am doing wrong and how can I use multiple features model.

onmt-main --config data2.yml --auto_config --model config/models/features.py train

WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.
INFO:tensorflow:Using parameters:
data:
source_1_vocabulary: folder_name/src__vocab_10000
source_2_vocabulary: folder_name/fe1_vocab_10000
source_3_vocabulary: folder_name/fe2_vocab_10000
target_vocabulary: folder_name/tgt__vocab_10000
train_features_file:

folder_name/src__10000

folder_name/fe1_10000

folder_name/fe2_10000
train_labels_file: folder_name/tgt__10000
eval:
batch_size: 32
infer:
batch_size: 32
length_bucket_width: 5
model_dir: folder_name/run/
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 1
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 3072
batch_type: tokens
effective_batch_size: 25000
eval:
eval_delay: 3600
external_evaluators: BLEU
keep_checkpoint_max: 8
length_bucket_width: 1
max_step: 500000
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 100
save_summary_steps: 100

2019-12-13 11:20:01.898411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-13 11:20:02.005703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:08:00.0
2019-12-13 11:20:02.006389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:42:00.0
2019-12-13 11:20:02.006474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006525: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcublas.so.10.0’; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006561: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcufft.so.10.0’; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006593: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcurand.so.10.0’; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006624: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusolver.so.10.0’; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006655: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusparse.so.10.0’; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006689: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2019-12-13 11:20:02.006696: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2019-12-13 11:20:02.007040: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-13 11:20:02.036353: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2994075000 Hz
2019-12-13 11:20:02.041283: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47f3550 executing computations on platform Host. Devices:
2019-12-13 11:20:02.041361: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-12-13 11:20:02.302031: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4759b00 executing computations on platform CUDA. Devices:
2019-12-13 11:20:02.302091: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-12-13 11:20:02.302101: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-12-13 11:20:02.302419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-13 11:20:02.302446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
INFO:tensorflow:Restored checkpoint folder_name/run/ckpt-0
INFO:tensorflow:Training on 10000 examples
INFO:tensorflow:Accumulate gradients of 9 iterations to reach effective batch size of 25000
2019-12-13 11:20:05.204931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:08:00.0
2019-12-13 11:20:05.205636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:42:00.0
2019-12-13 11:20:05.205765: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.205814: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcublas.so.10.0’; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.205860: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcufft.so.10.0’; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.205910: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcurand.so.10.0’; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.205954: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusolver.so.10.0’; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.205998: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusparse.so.10.0’; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.206041: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2019-12-13 11:20:05.206051: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2019-12-13 11:20:05.206142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-13 11:20:05.206154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2019-12-13 11:20:05.206162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2019-12-13 11:20:05.206168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
WARNING:tensorflow:There is non-GPU devices in tf.distribute.Strategy, not using nccl allreduce.
WARNING:tensorflow:From /home/folder_name/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py:253: _EagerTensorBase.cpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.identity instead.
INFO:tensorflow:Saved checkpoint folder_name/run/ckpt-0
WARNING:tensorflow:From /home/folder_name/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

guillaumekln · December 13, 2019, 9:53am

Hi,

It seems TensorFlow is unable to find your CUDA installation. Can you check if this page helps:

https://www.tensorflow.org/install/gpu

ishaansharma · December 14, 2019, 5:28am

Thank you.
I solved the problem. It was occurring because there was no CUDA driver.