Improve performance of ctranslate2 in WSL?

atikaakmal · December 18, 2020, 12:02pm

Hi eveyone,

Currently, I am using decoding feature(alternative at position) of ctranslate2 model. It is working perfect but the response of ctranslate2 is very low. When I run the script, It response (give output-alternatives) after 26 sec and sometimes after 35 - 40 sec.
Is it because of bad performance of my CPU or something else?. I would be thankful if someboday help me to resolve this problem.

Thank you.

ymoslem · December 18, 2020, 2:02pm

Dear Atika,

I run CTranslate2 from Python and here is the code that I use. It takes less than half a second. I have a lot of RAM though on a server machine. Still, you can run it on PythonAnywhere for example and see what you get.

Obviously, make sure you change the variables, and if it for real work, update the tokenization and detokenization functions as needed.

import ctranslate2

def detokenize(result):
    translation = " ".join([t for t in result])
    return translation


def tokenize(input_sentence):
    tokens = input_sentence.split(" ")
    return tokens


# Change these variables
model_path = "fren_ctranslate2/"
my_sent = "ce qui a creusé les inégalités préexistantes"
prefix = "this has deepended"


translator = ctranslate2.Translator(model_path, "cpu") # "cpu" or "cuda"

original_result = translator.translate_batch([tokenize(my_sent)], beam_size=5)
translation = detokenize(original_result[0][0]["tokens"])
print(translation)

results = translator.translate_batch(
    [tokenize(my_sent)],
    target_prefix=[tokenize(prefix)],
    num_hypotheses=10,
    return_alternatives=True,
    beam_size=5)
for hypothesis in results[0]:
    print(detokenize(hypothesis["tokens"]))

Kind regards,
Yasmin

atikaakmal · December 19, 2020, 12:38pm

Hi @ymoslem

Thanks for reply,

I also write almost the same code instead of ‘cpu’. But this solution is not working eventhough after write the ‘cpu’. Still I am getting the response after 25-40 secs.

Does you have another solution? or is it I am missing something.

Thank you

guillaumekln · December 19, 2020, 12:55pm

Hi,

Can you post the code that you used? Can you also describe the system it is running on?

atikaakmal · December 20, 2020, 5:16pm

Hi @guillaumekln,

Thanks for your reply.

please have a look below on the code and system specifications.

Code:

import ctranslate2
import sentencepiece as spm

 
Input = "This project is geared towards efficient serving of standard translation models but is also a place for experimentation around model compression and inference acceleration."

def tokenize(data):
       return sp.encode(data, out_type=str)

def detokenize(data):
      return sp.decode(data)

translator = ctranslate2.Translator("ende_ctranslate2/", cpu)
sp = spm.SentencePieceProcessor(model_file='wmtende.model')

results = translator.translate_batch(
          [tokenize(Input)],
          target_prefix=[tokenize("Dieses Prokekt ist auf die")],
          num_hypotheses=5,
          return_alternatives=True)

for hypothesis in results[0]:
    print(detokenize(hypothesis["tokens"]))

system specification:

Currently, I installed virtual ubuntu on my windows 10 and the ctranslate2 is running from ubuntu. I am sharing the specifications of both operating systems.

1. Windows:

Edition: windows 10 Pro (intel corei7)
Processor: Intel® Core™ i7-5600U CPU @ 2.60GHz, 2601 Mhz, 2 Core(s), 4 Logica Processor(s)
Intalled RAM: 12,0 GB (11,9 usable).
System Type: 64-bit os

2. Ubuntu app running on Windows 10 System info:
System: Host: ORB173990 Kernel: 4.4.0-18362-Microsoft x86_64 bits: 64 Console: tty 3 Distro: Ubuntu 20.04.1 LTS (Focal Fossa)
Machine: Message: No machine data: try newer kernel. Is dmidecode installed? Try -M
dmidecode.
CPU: Topology: Dual Core model: Intel Core i7-5600U bits: 64 type: MT MCP L2 cache: 256 KiB
Speed: 2601 MHz min/max: N/A Core speeds (MHz): 1: 2601 2: 2601 3: 2601 4: 2601
Graphics: Message: Device data requires root.
Display: server: X.org 1.20.8 driver: tty: 120x30

Network: Message: Device data requires root.
IF-ID-1: eth0 state: N/A speed: N/A duplex: N/A mac: 88:78:73:48:20:b9
IF-ID-2: eth1 state: N/A speed: N/A duplex: N/A mac: 38:af:d7:a0:c7:a3
IF-ID-3: eth2 state: N/A speed: N/A duplex: N/A mac: 00:ff:03:b7:7c:4d
IF-ID-4: eth3 state: N/A speed: N/A duplex: N/A mac: 00:15:5d:54:02:d0
IF-ID-5: eth4 state: N/A speed: N/A duplex: N/A mac: b4:bb:9c:56:57:5c
IF-ID-6: wifi0 state: N/A speed: N/A duplex: N/A mac: 88:78:73:48:20:b5
IF-ID-7: wifi1 state: N/A speed: N/A duplex: N/A mac: 88:78:73:48:20:b6
IF-ID-8: wifi2 state: N/A speed: N/A duplex: N/A mac: 8a:78:73:48:20:b5
Drives: Local Storage: total: N/A used: 218.19 GiB
Info: Processes: 9 Uptime: 12d 55m Memory: 11.88 GiB used: 9.74 GiB (82.0%) Init: N/A Shell:

guillaumekln · December 20, 2020, 6:47pm

Are you using WSL to run Ubuntu? If yes, is it WSL version 1 or 2?

atikaakmal · December 20, 2020, 11:27pm

Hi,

Yes, I am using WSL (version 1). I followed this below mentioned link to install virtual ubuntu.

guillaumekln · December 21, 2020, 8:58am

Do you know what is taking the most time in your script?

I would assume it is when creating the translator and loading the model. WSL (especially version 1) has very poor I/O performance, especially when reading files from the Windows filesystem.

Can you check for that?

atikaakmal · December 21, 2020, 9:53am

Hi @guillaumekln,

thanks for your reply.

I dont think so that i have any script that is taking too much time.
If I will run ctranslate2 on Linux os not virutual ubuntu then any possibility to get quick response?.

Thanks

guillaumekln · December 21, 2020, 11:54am

Sure, as @ymoslem mentioned this script should run very fast in normal circumstances. I’m not sure what is going on in WSL.

atikaakmal · December 21, 2020, 12:00pm

Hi @guillaumekln.

Thanks for your support and valuable feedback.

I will try it on a normal Linux os and let’s see what happend.

Thanks

tel34 · May 18, 2021, 11:40am

Hi @atikaakmal, Having spotted this thread I thought I’d mention that I am running ctranslate2 on WSL2g so I can deploy a GUI. Even on my Asus Zenbook (8GB RAM) speed is not an issue (real 0m1.652s
user 0m1.122s sys 0m1.336s for translation of one sentence) but I am finding the translation output a lot poorer than when running the same underlying model (Transformer) using the commonly used Python script on my Asus Zenbook. Edit: Further to installing the latest OpenNMT-tf and exporting with “–export_format ctranslate2” I am seeing a significant improvement in translation output.