I am using open-nmt on a cluster maintained by slurm utility. Most of the gpus are NVIDIA pascal machines. Some are Maxwell. Slurm randomly assigns jobs to different devices depending on the availability. If I use a model trained on one architecture to test on another gpu architecture, it throws “invalid device error”.
cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-9435/cutorch/lib/THC/generic/THCTensorMath.cu:35
I have not faced this issue while using other torch applications or theano. Does Open-nmt restricts training and testing to be on same gpu architecture ? Is there a workaround for this ?
I am using latest torch version pulled on January 15th and open-nmt version pulled on January 16th.