How to reduce inference time of ctranslate2 model

Shivamp9 · April 3, 2023, 9:56am

when I was using a simple.pt file of the model then my model inference time was approx 10 sec but after converting the model in ctransalte2 model inference time is approx 20 sec how did I decrease this time? I am using ubuntu cpu server.
my model config file is

guillaumekln · April 3, 2023, 4:24pm

What are you measuring? 10 or 20 seconds seem a long time for a single request.

Also, how many CPU cores has this Ubuntu server?

Shivamp9 · April 4, 2023, 3:43am

Cpu information with 15 gb ram
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 7
BogoMIPS: 5187.81