Baseline training script not using GPU

I’m trying to run the example training script to train a English-German model. I’m running Ubuntu 20.04 with a GeForce GTX 780. On a fresh Ubuntu 20.04 instance I checked the box during installation for Ubuntu to install proprietary drivers and the nvidia-driver-440 package was installed. Then I installed Docker (package from apt) and installed NVIDIA Docker support. I ran the TensorFlow Docker image with this command:

sudo docker run -it tensorflow/tensorflow bash

Inside the Docker container I Installed OpenNMT:

pip install --upgrade pip
pip install OpenNMT-tf

I compiled and installed SentencePiece, then I ran the baseline training script. prepare_data.sh ran with no problems (but didn’t use the GPU), but when I ran the 1 GPU training script the GPU wasn’t being used:

CUDA_VISIBLE_DEVICES=0 ./run_wmt_ende_1gpu.sh

I can’t tell if this is some sort of drivers issue, an issue with my TensorFlow installation, or an issue with OpenNMT. Any help greatly appreciated!

I had previously tried installing the GPU drivers (from the NVIDIA website), CUDA, cuDNN, and TensorFlow outside of Docker and running the training script but had the same problem of the GPU not being used. My next step was to try running the OpenNMT Docker image and see if I can get that to work but I’d like to have the control of running OpenNMT outside of Docker.

Thanks for any help!

You should enable the GPU when starting the Docker container. See NVIDIA’s documentation:

https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(Native-GPU-Support)#usage

1 Like

@guillaumekln good catch thanks. That was definitely an issue but after fixing it the GPU still isn’t being used.

What I’m trying to run is:

sudo docker run --gpus all -it tensorflow/tensorflow bash

on Ubuntu 20.04 with NVIDIA drivers, Docker, and NVIDIA Docker support installed. Then inside of the Docker container running:

cd

# Install OpenNMT
pip install --upgrade pip
pip install OpenNMT-tf

# Install SentencePiece
apt-get update
apt-get install cmake build-essential pkg-config libgoogle-perftools-dev vim git -y
git clone https://github.com/google/sentencepiece.git
cd sentencepiece
mkdir build
cd build
cmake ..
make -j $(nproc)
make install
ldconfig -v
cd

# Run baseline training script
apt install wget -y
git clone https://github.com/OpenNMT/OpenNMT-tf.git
cd OpenNMT-tf/scripts/wmt
./prepare_data.sh raw_data
CUDA_VISIBLE_DEVICES=0 ./run_wmt_ende_1gpu.sh

When running this all of my CPUs get used up but the “GPU Utilization” is normally ~3% and never goes over 15%.

Can you post the full training logs?

1 Like
root@47ffc35e8afb:~/OpenNMT-tf/scripts/wmt# CUDA_VISIBLE_DEVICES=0 ./run_wmt_ende_1gpu.sh
2020-08-07 11:51:16.682044: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-07 11:51:16.682071: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-07 11:51:16.682092: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 47ffc35e8afb
2020-08-07 11:51:16.682118: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 47ffc35e8afb
2020-08-07 11:51:16.682181: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
2020-08-07 11:51:16.682220: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.100.0
WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.
INFO:tensorflow:Using model:
(model): TransformerBase(
  (examples_inputter): SequenceToSequenceInputter(
    (features_inputter): WordEmbedder()
    (labels_inputter): WordEmbedder()
    (inputters): ListWrapper(
      (0): WordEmbedder()
      (1): WordEmbedder()
    )
  )
  (encoder): SelfAttentionEncoder(
    (position_encoder): SinusoidalPositionEncoder(
      (reducer): SumReducer()
    )
    (layer_norm): LayerNorm()
    (layers): ListWrapper(
      (0): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (1): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (2): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (3): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (4): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (5): SelfAttentionEncoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
    )
  )
  (decoder): SelfAttentionDecoder(
    (position_encoder): SinusoidalPositionEncoder(
      (reducer): SumReducer()
    )
    (layer_norm): LayerNorm()
    (layers): ListWrapper(
      (0): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (1): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (2): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (3): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (4): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
      (5): SelfAttentionDecoderLayer(
        (self_attention): TransformerLayerWrapper(
          (layer): MultiHeadAttention(
            (linear_queries): Dense(512)
            (linear_keys): Dense(512)
            (linear_values): Dense(512)
            (linear_output): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
        (attention): ListWrapper(
          (0): TransformerLayerWrapper(
            (layer): MultiHeadAttention(
              (linear_queries): Dense(512)
              (linear_keys): Dense(512)
              (linear_values): Dense(512)
              (linear_output): Dense(512)
            )
            (input_layer_norm): LayerNorm()
          )
        )
        (ffn): TransformerLayerWrapper(
          (layer): FeedForwardNetwork(
            (inner): Dense(2048)
            (outer): Dense(512)
          )
          (input_layer_norm): LayerNorm()
        )
      )
    )
  )
)

INFO:tensorflow:Using parameters:
data:
  eval_features_file: data/valid.en
  eval_labels_file: data/valid.de
  source_vocabulary: data/wmtende.vocab
  target_vocabulary: data/wmtende.vocab
  train_features_file: data/train.en
  train_labels_file: data/train.de
eval:
  batch_size: 32
  batch_type: examples
  external_evaluators: BLEU
  length_bucket_width: 5
infer:
  batch_size: 32
  batch_type: examples
  length_bucket_width: 5
model_dir: wmt_ende_transformer
params:
  average_loss_in_time: true
  beam_width: 4
  decay_params:
    model_dim: 512
    warmup_steps: 8000
  decay_type: NoamDecay
  label_smoothing: 0.1
  learning_rate: 2.0
  num_hypotheses: 1
  optimizer: LazyAdam
  optimizer_params:
    beta_1: 0.9
    beta_2: 0.998
score:
  batch_size: 64
train:
  average_last_checkpoints: 8
  batch_size: 3072
  batch_type: tokens
  effective_batch_size: 25000
  keep_checkpoint_max: 8
  length_bucket_width: 1
  max_step: 500000
  maximum_features_length: 100
  maximum_labels_length: 100
  sample_buffer_size: -1
  save_checkpoints_steps: 1000
  save_summary_steps: 100

2020-08-07 11:51:17.298986: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2020-08-07 11:51:17.326176: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3515335000 Hz
2020-08-07 11:51:17.326996: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa844000e30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-07 11:51:17.327053: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:tensorflow:Restored checkpoint wmt_ende_transformer/ckpt-1
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/summary/summary_iterator.py:68: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:tensorflow:Accumulate gradients of 9 iterations to reach effective batch size of 25000
INFO:tensorflow:Training on 4562102 examples
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Number of model parameters: 93326081
INFO:tensorflow:Number of model weights: 260 (trainable = 260, non trainable = 0)
2020-08-07 11:52:42.606207: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 1028016 of 4562102
2020-08-07 11:52:52.606210: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 2062755 of 4562102
2020-08-07 11:53:02.606205: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 3066941 of 4562102
2020-08-07 11:53:12.606209: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 4042947 of 4562102
2020-08-07 11:53:17.881402: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.

It seems like its having an issue connecting to CUDA even though it should be preinstalled in the Docker image.

Does the command nvidia-smi work inside the container?

1 Like
root@47ffc35e8afb:/# nvidia-smi
Fri Aug  7 13:36:09 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 780     Off  | 00000000:01:00.0 N/A |                  N/A |
| 35%   29C    P8    N/A /  N/A |     15MiB /  3020MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

It does, it looks like there’s definitely something wrong with CUDA. Looking at the TensorFlow Docker documentation again I think I need to include the latest-gpu tag for the Docker image changing the command to start the Docker container to:

sudo docker run -it tensorflow/tensorflow:latest-gpu bash

I’ll give this a try and post if it works.

Using the wrong Docker container ended up being the issue, since my GPU is to old to support CUDA 11.x I ended up using nvidia/cuda instead of the tensorflow container:

sudo docker run --gpus all -it --name cuda nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 bash

Thanks for the help @guillaumekln!