Thanks for that. I have temporarily gone back to Cuda 8 but am experiencing the same nvcc fatal error Unsupported gpu architecture ‘compute_61’. As I have experienced no issues installing Cuda 8 on a machine with GTX 1070, I must put down the failure down to an issue with the new GTX 1080Ti.
I’m assuming that after your installation you are successfully running OpenNMT?
I will run your procedure in the morning.
Thanks again.
Terence
Hi Terence, yes - OpenNMT runs smoothly with the procedure described above - just missing half precision, but we had no success with half precision so far anyway.
Best!
did you try it on Ubuntu 14.04 too ?
Cuda 9 is not supported on ubuntu 14. It is only available on ubuntu 16.04 and ubuntu 17.04 ( https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu )
Seem to have narrowed this down to an issue with THCBlas which is the point where the NVCC build fails (at 4%). Will report my progress.
Terence
I just installed cuda 9.0 + patch 1 on ubuntu 14.04 and it runs fine with TF 1.5.0 (which precompiled binary comes for cuda 9.0)
it works fine.
I’ll try later to install torch.
just to report that it is not really an issue with cuda 9 not being compatible with ubuntu 14, but could be a library issue later on when building.
Everything goes fine with the installation according to the procedure outlined by @jprama until we get to the CUDA section. I am puzzled that Cuda 7.5 is reported because I have installed Cuda-9.1 and it has passed the tests in the Samples. The first “failure” is at ib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target ‘lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o’ failed
and I honestly do not know what to look for here. Any suggestions would be welcome. The relevant part of my install log is given below:
Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- Found gcc >=5 and CUDA <= 7.5, adding workaround C++ flags
-- MAGMA not found. Compiling without MAGMA support
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;5.2+PTX
-- got cuda version 7.5
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[ 2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o
[ 2%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o
[ 4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:70: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:77: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:63: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:560: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCHalf.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed
jopts=$(getconf _NPROCESSORS_CONF)
echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install
You have a conflict with cuda 7.5 installer on your system.
Can you check if you have a sym link for /usr/local/cuda (ls -l /usr/local/cuda
) and if you haven’t any 7.5 cuda packages installed on your system (dpkg -l | grep nvidia
) .
The sym link /usr/local/cuda points to /usr/local/cuda-9.1 which is where I successfully ran the Cuda tests. However I DO seem to still have cuda 7.5 packages on the system as shown below. They must have got installed when I installed the driver after fitting the GTX 1800Ti but before running your installation procedure. Do you recommend getting rid of them selectively or doing a apt-get remove purge nvidia*.
Thanks,
Output of dpkg -l | grep nvidia
:
ii nvidia-387 387.34-0ubuntu0~gpu16.04.2 amd64 NVIDIA binary driver - version 387.34
ii nvidia-cuda-dev 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-doc 7.5.18-0ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-cuda-gdb 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 7.5.18-0ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-opencl-dev:amd64 7.5.18-0ubuntu1 amd64 NVIDIA OpenCL development files
ii nvidia-opencl-icd-387 387.34-0ubuntu0~gpu16.04.2 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-settings 390.12-0ubuntu0~gpu16.04.1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-visual-profiler 7.5.18-0ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
hello,
You just need to remove the following packages
nvidia-cuda-dev
nvidia-cuda-doc
nvidia-cuda-gdb
nvidia-cuda-toolkit
nvidia-opencl-dev:amd64
nvidia-profiler
nvidia-visual-profiler
Thanks - will report progress in a couple of days - have to turn my mind to something different today.
For information, I could install torch 7 and cuda9 (using ubuntu 16 packages : https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux and https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux ) on ubuntu 14.04 LTS.
I could launch a training and a translation with openNMT .
Hi, I am following your procedure to the letter and still failing to complete the build at the “Cuda” stage. The relevant part of the log is pasted below. Any suggestions would be welcome. Thanks.
cd build && make install
Updating manifest for /home/miguel/torch/install/lib/luarocks/rocks
optim 1.0.5-0 is now built and installed in /home/miguel/torch/install/ (license: BSD)
Found CUDA on your machine. Installing CUDA packages
Building on 4 cores
-- Found Torch7 in /home/miguel/torch/install
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 6.1
-- got cuda version 9.1
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_61,code=sm_61;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/miguel/torch/extra/cutorch/build
[ 1%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathMagma.cu.o
[ 3%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathBlas.cu.o
[ 4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:3793: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
lib/THC/CMakeFiles/THC.dir/build.make:2901: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
Makefile:127: recipe for target 'all' failed
jopts=$(getconf _NPROCESSORS_CONF)
echo "Building on $jopts cores"
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DLUA_INCDIR=/home/miguel/torch/install/include -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/miguel/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/miguel/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install
I have followed the prescribed procedure precisely and entered
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" immediately before running install.sh. Each time the installation fails at the point shown below. As I have cuda 8 running with a GTX 1070 on another machine for year without problems I am inclined to remove cuda 9 and go back to cuda 8 unless any kind suggestions help.
Hi @tel34,
The error shows that half operators are still in the play. Try to clean first and rerun it
./clean.sh
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
./install.sh
Hi @panosk,
Yes, I’ve done that. The NVCC building now got to 11% and then failed as shown below. I’ve researched this but have not put my finger on what’s going wrong. It seems not to be retaining the export command. I’ve even tried putting it in bashrc.
Hm, weird… Are you using bash? Try to run this and see:
echo $0
Yes, and I get -bash. I’ve tried it again since I sent the last message and it gets to 11% in the NVCC build and fails.
Well, last things I can think of is try to source ~/.bashrc
if you haven’t logged out/rebooted after you changed it, or put the export in front of install.sh
in one line
TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh
Well, @panosk, I owe you a beer :-).
Putting the export in front of install.sh did the trick.
Hopefully tomorrow I can start some new trainings with my GTX 1080Ti.
And thanks to everyone else who offered suggestions!