Training fails on Multi GPU with Torch

I am trying to run the training as:

th train.lua -data data/demo-train.t7 -save_model model \
        -layers 6 -rnn_size 512 -word_vec_size 512 \
        -dropout 0.1 \
        -max_batch_size 28672  \
        -optim adam  -learning_rate 0.0002 \
        -max_grad_norm 0  \
        -attention global \
        -async_parallel \
        -end_epoch 50 -gpuid 3 5

And keep on getting this exception:

[11/14/18 16:59:05 INFO] Using GPU(s): 3, 5
[11/14/18 16:59:05 WARNING] The caching CUDA memory allocator is enabled. This allocator improves performance at the cost of a higher GPU memory usage. To optimize for memory, consider disabling it by setting the environment variable: THC_CACHING_ALLOCATOR=0
FATAL THREAD PANIC: (read) /opt/torch/share/lua/5.1/torch/File.lua:343: unknown Torch class <Logger>

Training on one GPU works.
Do you know what the problem might be?

Thanks in advance!

We found the solution, we had to install tds and bit32 in the system scope. OpenNMT seems to have problems with locally installed Lua modules.

Whenever possible, we recommend using the Docker image as it contains everything and is well optimized, both in size and speed.