We have installed a GTX 1080Ti and successfully installed Cuda-9.1 (passing the provided tests). However the installation of cutorch is failing with the fatal error: nvcc fatal : Unsupported gpu architecture ‘compute_61’.
Has anyone managed to install cutorch with Cuda-9.1? Or should I go back to Cuda-8?
Any suggestions welcome
if you check on gitter and on this list The Ultimate wish-list for OpenNMT-Lua
I would suggest you go back to 8 until there is a solution.
Also read and track this issue:
It’s unlikely Torch will officially support Cuda 9+.
I’ll be uninstalling 9 in the next five minutes
Hi @tel34,
In my latest install I used these exports with the install script and torch installed fine with latest cuda:
CC=gcc-6 CXX=gcc-6 TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh
No need to change gcc if 6 is your default version.
hello,
I could install torch 7 and cuda 9.1 on ubuntu 16 with the following installation procedure as root user :
apt-get update
apt-get install build-essential
wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux
bash cuda_9.1.85_387.26_linux
wget https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux
bash cuda_9.1.85.1_linux
export PATH=$PATH:/usr/local/cuda-9.1/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.1/lib64
git clone https://github.com/torch/distro.git ~/torch --recursive
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
cd ~/torch
bash install-deps
bash install.sh
. ../.bashrc
luarock list
Installed rocks:
----------------
argcheck
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
cudnn
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
cunn
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
cutorch
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
cwrap
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
dok
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
env
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
gnuplot
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
graph
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
image
1.1.alpha-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
lua-cjson
2.1devel-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
luaffi
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
luafilesystem
1.6.3-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
moses
1.6.1-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
nn
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
nngraph
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
nnx
0.1-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
optim
1.0.5-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
paths
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
penlight
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
qtlua
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
qttorch
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
sundown
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
sys
1.1-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
threads
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
torch
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
trepl
scm-1 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
xlua
1.0-0 (installed) - /home/ubuntu/torch/install/lib/luarocks/rocks
Thanks for that. I have temporarily gone back to Cuda 8 but am experiencing the same nvcc fatal error Unsupported gpu architecture ‘compute_61’. As I have experienced no issues installing Cuda 8 on a machine with GTX 1070, I must put down the failure down to an issue with the new GTX 1080Ti.
I’m assuming that after your installation you are successfully running OpenNMT?
I will run your procedure in the morning.
Thanks again.
Terence
Hi Terence, yes - OpenNMT runs smoothly with the procedure described above - just missing half precision, but we had no success with half precision so far anyway.
Best!
did you try it on Ubuntu 14.04 too ?
Cuda 9 is not supported on ubuntu 14. It is only available on ubuntu 16.04 and ubuntu 17.04 ( https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu )
Seem to have narrowed this down to an issue with THCBlas which is the point where the NVCC build fails (at 4%). Will report my progress.
Terence
I just installed cuda 9.0 + patch 1 on ubuntu 14.04 and it runs fine with TF 1.5.0 (which precompiled binary comes for cuda 9.0)
it works fine.
I’ll try later to install torch.
just to report that it is not really an issue with cuda 9 not being compatible with ubuntu 14, but could be a library issue later on when building.
Everything goes fine with the installation according to the procedure outlined by @jprama until we get to the CUDA section. I am puzzled that Cuda 7.5 is reported because I have installed Cuda-9.1 and it has passed the tests in the Samples. The first “failure” is at ib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target ‘lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o’ failed
and I honestly do not know what to look for here. Any suggestions would be welcome. The relevant part of my install log is given below:
Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- Found gcc >=5 and CUDA <= 7.5, adding workaround C++ flags
-- MAGMA not found. Compiling without MAGMA support
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;5.2+PTX
-- got cuda version 7.5
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[ 2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o
[ 2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o
[ 4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:77: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:63: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:560: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed
jopts=$(getconf _NPROCESSORS_CONF)
echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install
You have a conflict with cuda 7.5 installer on your system.
Can you check if you have a sym link for /usr/local/cuda (ls -l /usr/local/cuda
) and if you haven’t any 7.5 cuda packages installed on your system (dpkg -l | grep nvidia
) .
The sym link /usr/local/cuda points to /usr/local/cuda-9.1 which is where I successfully ran the Cuda tests. However I DO seem to still have cuda 7.5 packages on the system as shown below. They must have got installed when I installed the driver after fitting the GTX 1800Ti but before running your installation procedure. Do you recommend getting rid of them selectively or doing a apt-get remove purge nvidia*.
Thanks,
Output of dpkg -l | grep nvidia
:
ii nvidia-387 387.34-0ubuntu0~gpu16.04.2 amd64 NVIDIA binary driver - version 387.34
ii nvidia-cuda-dev 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-doc 7.5.18-0ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-cuda-gdb 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-opencl-dev:amd64 7.5.18-0ubuntu1 amd64 NVIDIA OpenCL development files
ii nvidia-opencl-icd-387 387.34-0ubuntu0~gpu16.04.2 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 390.12-0ubuntu0~gpu16.04.1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-visual-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
hello,
You just need to remove the following packages
nvidia-cuda-dev
nvidia-cuda-doc
nvidia-cuda-gdb
nvidia-cuda-toolkit
nvidia-opencl-dev:amd64
nvidia-profiler
nvidia-visual-profiler
Thanks - will report progress in a couple of days - have to turn my mind to something different today.
For information, I could install torch 7 and cuda9 (using ubuntu 16 packages : https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux and https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux ) on ubuntu 14.04 LTS.
I could launch a training and a translation with openNMT .
Hi, I am following your procedure to the letter and still failing to complete the build at the “Cuda” stage. The relevant part of the log is pasted below. Any suggestions would be welcome. Thanks.
cd build && make install
Updating manifest for /home/miguel/torch/install/lib/luarocks/rocks
optim 1.0.5-0 is now built and installed in /home/miguel/torch/install/ (license: BSD)
Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 6.1
-- got cuda version 9.1
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_61,code=sm_61;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[ 1%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathMagma.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathBlas.cu.o
[ 4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:3793: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:2901: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed
jopts=$(getconf _NPROCESSORS_CONF)
echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install
I have followed the prescribed procedure precisely and entered
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" immediately before running install.sh. Each time the installation fails at the point shown below. As I have cuda 8 running with a GTX 1070 on another machine for year without problems I am inclined to remove cuda 9 and go back to cuda 8 unless any kind suggestions help.