The warning of nccl

[root@localhost OpenNMT]# th train.lua -data data/demo-train.t7 -save_model model -gpuid 1,2[05/15/17 20:56:02 INFO] Using GPU(s): 1, 2
[05/15/17 20:56:02 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
[05/15/17 20:56:03 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
[05/15/17 20:56:03 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
Tried loading libnccl.so.1 but got error /root/torch/install/share/lua/5.2/nccl/ffi.lua:192: could not load library libnccl.so.1
Tried loading libnccl.1.dylib but got error /root/torch/install/share/lua/5.2/nccl/ffi.lua:192: could not load library libnccl.1.dylib
[05/15/17 20:56:03 WARNING] For improved efficiency with multiple GPUs, consider installing nccl
[05/15/17 20:56:03 INFO] Training Sequence to Sequence with Attention model
[05/15/17 20:56:03 INFO] Loading data from ‘data/demo-train.t7’…
[05/15/17 20:56:04 INFO] * vocabulary size: source = 24999; target = 35820
[05/15/17 20:56:04 INFO] * additional features: source = 0; target = 0
[05/15/17 20:56:04 INFO] * maximum sequence length: source = 50; target = 51
[05/15/17 20:56:04 INFO] * number of training sentences: 10000
[05/15/17 20:56:04 INFO] * maximum batch size: 64
[05/15/17 20:56:04 INFO] Building model…
[05/15/17 20:56:06 INFO] * Encoder:
[05/15/17 20:56:06 INFO] - with word embeddings size: 500
[05/15/17 20:56:06 INFO] * Decoder:
[05/15/17 20:56:07 INFO] - with word embeddings size: 500

my queston ;
(1) I have installed the nccl by “luarocks install nccl”,but it also has the error (Tried loading libnccl.so.1 but got error /root/torch/install/share/lua/5.2/nccl/ffi.lua:192: could not load library libnccl.so.1
Tried loading libnccl.1.dylib but got error /root/torch/install/share/lua/5.2/nccl/ffi.lua:192: could not load library libnccl.1.dylib ),How to deal with the warning and the error about the nccl?
(2) How to deal with the warning about the caching CUDA memory allocator is enabled?

  1. It seems you missed this part:

Have libnccl.so in your library path

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH;/path/to/libnccl.so"
  1. How to deal with the CUDA caching allocator is up to you. If you have enough memory, just ignore it. If you experience out of memory issues on the GPU, consider doing what the warning advise you.

thank you very much, it is very useful for me.:slight_smile: