Runtime error when translating using ctranslate2

owenljn · November 26, 2020, 7:24pm

Hello,

I recently switched to CTanslate2, and converted my OpenNMT-tf TransformerBig model using CTranslate2, but I had a runtime error related to cuBLAS:

this error happens when I’m translating a batch of sentences or just a single sentence, the model can be loaded without problem at all.
Below is my related environment:
tensorflow-gpu version: 2.3.1
CUDA version: 10.1
Nvidia Driver version: 430.64
NVCC version: 7.5.17
CTranslate2 version: 1.16.1
GPU model: RTX2080Ti with 11GB VRAM
OS version: Linux 16.04

I have 7 GPUs available, no tasks are running on them, I tried every one of them, all produce the same error, running the old OpenNMT-tf version on them has no problem at all.

Does anyone else experience the same problem?

The code I used to run it is as follows:

guillaumekln · November 26, 2020, 8:12pm

Hi,

Thanks for reporting and providing additional information.

This was also reported in the issue below, but I was not able to reproduce yet:

In both report, the system has multiple GPUs. Could you try setting device_index to 0 and running your script with CUDA_VISIBLE_DEVICES=0 python ...? Do you still get the same error?

owenljn · November 26, 2020, 8:24pm

Hi thanks for the quick reply!

Unfortunately I still get the same error. Also, unlike the other reported issue, I’m not running this through docker, just the simple python code which is slightly modified from the python instructions, it’s in the “Translation API” section of the following link:
[https://github.com/OpenNMT/CTranslate2/blob/master/docs/python.md#model-conversion-api]

owenljn · November 27, 2020, 1:33am

Right now I’m trying the docker solution, ran the provided sample code, but only got a random text output like this:

My Docker version is 19.03.8
I followed every step in the following link:

github.com

OpenNMT/OpenNMT-tf/blob/master/examples/serving/tensorflow_serving/README.md

# Inference with TensorFlow Serving

This example shows how to start a TensorFlow Serving GPU instance and sends translation requests via a simple Python client.

## Requirements

* Docker 19.03 or above

## Usage

**1\. Go into this directory, as assumed by the rest of the commands:**

```bash
cd examples/serving/tensorflow_serving
```

**2\. Download the English-German pretrained model:**

```bash
wget https://s3.amazonaws.com/opennmt-models/averaged-ende-export500k-v2.tar.gz

This file has been truncated. show original

Nothing happened after step 3, I only got an output of that number&character combination like shown in the screenshot.

guillaumekln · November 27, 2020, 10:31am

I finally managed to reproduce the issue by testing with an older GPU driver. The version of the cuBLAS library included in the Python package was incorrect. It should be fixed in the latest version 1.16.2. You can update the package and try again.

pip install --upgrade ctranslate2

This is unrelated to the current topic. You may want to open another one. (The random text is the ID of the Docker container that is running in background.)

owenljn · November 27, 2020, 2:12pm

Thanks alot!!! It’s now working like a charm!!