Trying to run converted Marian models using CPU only

Tuglat.02 · December 6, 2024, 4:59pm

I built ct2-translator using MKL and CUDA, then converted a marian
model successfully using ct2-marian-converter into a ctranslate2 model
for each of the different conversion types.

Each of these runs fine on GPU. However, I get
the following error messages when trying to run using CPU-only:

int16
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested int16 compute type, but the target device or backend do not support efficient int16 computation.
Aborted (core dumped)

float16
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
Aborted (core dumped)

float32
terminate called after throwing an instance of ‘std::runtime_error’
what(): No SGEMM backend on CPU
Aborted (core dumped)

int8
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested int8 compute type, but the target device or backend do not support efficient int8 computation.
Aborted (core dumped)

int8_float32
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested int8_float32 compute type, but the target device or backend do not support efficient int8_float32 computation.
Aborted (core dumped)

int8_float16
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested int8_float16 compute type, but the target device or backend do not support efficient int8_float16 computation.
Aborted (core dumped)

int8_bfloat16
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested int8_bfloat16 compute type, but the target device or backend do not support efficient int8_bfloat16 computation.
Aborted (core dumped)

bfloat16
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Requested bfloat16 compute type, but the target device or backend do not support efficient bfloat16 computation.
Aborted (core dumped)

Is it possible to run converted marian models on CPU? If so, are there
some specific parameters that must be set either when building the
model, or when converting it?

Thanks!