Install Torch 7 with CUDA 9.1


(Terence Lewis) #1

We have installed a GTX 1080Ti and successfully installed Cuda-9.1 (passing the provided tests). However the installation of cutorch is failing with the fatal error: nvcc fatal : Unsupported gpu architecture ‘compute_61’.
Has anyone managed to install cutorch with Cuda-9.1? Or should I go back to Cuda-8?
Any suggestions welcome :slight_smile:


(Vincent Nguyen) #2

if you check on gitter and on this list The Ultimate wish-list for OpenNMT-Lua

I would suggest you go back to 8 until there is a solution.


(Guillaume Klein) #3

Also read and track this issue:

It’s unlikely Torch will officially support Cuda 9+.


(Terence Lewis) #4

I’ll be uninstalling 9 in the next five minutes :slight_smile:


(Panos Kanavos) #5

Hi @tel34,

In my latest install I used these exports with the install script and torch installed fine with latest cuda:

CC=gcc-6 CXX=gcc-6 TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh

No need to change gcc if 6 is your default version.


(Jean-Pierra RAMATCHANDIRIN) #6

hello,

I could install torch 7 and cuda 9.1 on ubuntu 16 with the following installation procedure as root user :

apt-get update
apt-get install build-essential
wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux
bash cuda_9.1.85_387.26_linux 
wget https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux
bash cuda_9.1.85.1_linux
export PATH=$PATH:/usr/local/cuda-9.1/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.1/lib64
git clone https://github.com/torch/distro.git ~/torch --recursive
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
cd ~/torch
bash install-deps
bash install.sh 
. ../.bashrc 
luarock list 
Installed rocks:
----------------

argcheck
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

cudnn
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

cunn
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

cutorch
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

cwrap
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

dok
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

env
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

gnuplot
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

graph
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

image
   1.1.alpha-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

lua-cjson
   2.1devel-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

luaffi
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

luafilesystem
   1.6.3-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

moses
   1.6.1-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

nn
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

nngraph
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

nnx
   0.1-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

optim
   1.0.5-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

paths
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

penlight
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

qtlua
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

qttorch
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

sundown
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

sys
   1.1-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

threads
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

torch
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

trepl
   scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

xlua
   1.0-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks

(Terence Lewis) #7

Thanks for that. I have temporarily gone back to Cuda 8 but am experiencing the same nvcc fatal error Unsupported gpu architecture ‘compute_61’. As I have experienced no issues installing Cuda 8 on a machine with GTX 1070, I must put down the failure down to an issue with the new GTX 1080Ti.
I’m assuming that after your installation you are successfully running OpenNMT?
I will run your procedure in the morning.
Thanks again.
Terence


(jean.senellart) #8

Hi Terence, yes - OpenNMT runs smoothly with the procedure described above - just missing half precision, but we had no success with half precision so far anyway.

Best!


(Vincent Nguyen) #9

did you try it on Ubuntu 14.04 too ?


(Jean-Pierra RAMATCHANDIRIN) #10

Cuda 9 is not supported on ubuntu 14. It is only available on ubuntu 16.04 and ubuntu 17.04 ( https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu )


(Terence Lewis) #11

Seem to have narrowed this down to an issue with THCBlas which is the point where the NVCC build fails (at 4%). Will report my progress.
Terence


(Vincent Nguyen) #12

I just installed cuda 9.0 + patch 1 on ubuntu 14.04 and it runs fine with TF 1.5.0 (which precompiled binary comes for cuda 9.0)
it works fine.
I’ll try later to install torch.
just to report that it is not really an issue with cuda 9 not being compatible with ubuntu 14, but could be a library issue later on when building.


(Terence Lewis) #13

Everything goes fine with the installation according to the procedure outlined by @jprama until we get to the CUDA section. I am puzzled that Cuda 7.5 is reported because I have installed Cuda-9.1 and it has passed the tests in the Samples. The first “failure” is at ib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target ‘lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o’ failed
and I honestly do not know what to look for here. Any suggestions would be welcome. The relevant part of my install log is given below:

Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- Found gcc >=5 and CUDA <= 7.5, adding workaround C++ flags
-- MAGMA not found. Compiling without MAGMA support
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;5.2+PTX
-- got cuda version 7.5
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[  2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o
[  2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o
[  3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o
[  4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:77: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:63: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:560: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed

jopts=$(getconf _NPROCESSORS_CONF)

echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install

(Jean-Pierra RAMATCHANDIRIN) #14

You have a conflict with cuda 7.5 installer on your system.
Can you check if you have a sym link for /usr/local/cuda (ls -l /usr/local/cuda) and if you haven’t any 7.5 cuda packages installed on your system (dpkg -l | grep nvidia) .


(Terence Lewis) #15

The sym link /usr/local/cuda points to /usr/local/cuda-9.1 which is where I successfully ran the Cuda tests. However I DO seem to still have cuda 7.5 packages on the system as shown below. They must have got installed when I installed the driver after fitting the GTX 1800Ti but before running your installation procedure. Do you recommend getting rid of them selectively or doing a apt-get remove purge nvidia*.
Thanks,
Output of dpkg -l | grep nvidia:

ii  nvidia-387                            387.34-0ubuntu0~gpu16.04.2                 amd64        NVIDIA binary driver - version 387.34
ii  nvidia-cuda-dev                       7.5.18-0ubuntu1                            amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                       7.5.18-0ubuntu1                            all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                       7.5.18-0ubuntu1                            amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                   7.5.18-0ubuntu1                            amd64        NVIDIA CUDA development toolkit
ii  nvidia-opencl-dev:amd64               7.5.18-0ubuntu1                            amd64        NVIDIA OpenCL development files
ii  nvidia-opencl-icd-387                 387.34-0ubuntu0~gpu16.04.2                 amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                          0.8.2                                      amd64        Tools to enable NVIDIA's Prime
ii  nvidia-profiler                       7.5.18-0ubuntu1                            amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                       390.12-0ubuntu0~gpu16.04.1                 amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-visual-profiler                7.5.18-0ubuntu1                            amd64        NVIDIA Visual Profiler for CUDA and OpenCL

(Jean-Pierra RAMATCHANDIRIN) #16

hello,

You just need to remove the following packages
nvidia-cuda-dev
nvidia-cuda-doc
nvidia-cuda-gdb
nvidia-cuda-toolkit
nvidia-opencl-dev:amd64
nvidia-profiler
nvidia-visual-profiler


(Terence Lewis) #17

Thanks - will report progress in a couple of days - have to turn my mind to something different today.


(Jean-Pierra RAMATCHANDIRIN) #18

For information, I could install torch 7 and cuda9 (using ubuntu 16 packages : https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux and https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux ) on ubuntu 14.04 LTS.
I could launch a training and a translation with openNMT .


(Terence Lewis) #19

Hi, I am following your procedure to the letter and still failing to complete the build at the “Cuda” stage. The relevant part of the log is pasted below. Any suggestions would be welcome. Thanks.

cd build && make install
Updating manifest for /home/miguel/torch/install/lib/luarocks/rocks
optim 1.0.5-0 is now built and installed in /home/miguel/torch/install/ (license: BSD)

Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 6.1
-- got cuda version 9.1
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_61,code=sm_61;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[  1%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o
[  3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathMagma.cu.o
[  3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathBlas.cu.o
[  4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:3793: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:2901: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed

jopts=$(getconf _NPROCESSORS_CONF)

echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install

(Terence Lewis) #20

I have followed the prescribed procedure precisely and entered
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" immediately before running install.sh. Each time the installation fails at the point shown below. As I have cuda 8 running with a GTX 1070 on another machine for year without problems I am inclined to remove cuda 9 and go back to cuda 8 unless any kind suggestions help.