Ctranslate2-int8 to ctranslate2-float32 or ctranslate2-16 conversion

noclever · February 5, 2024, 3:45pm

Hi,

Due to hardware limits we can no longer use ctranslate2-int8 on our platform; meanwhile since the original checkpoints are gone, we can’t easily make other conversions (we are using opennmt-tf btw if it matters). May I ask if it’s possible to convert ctranslate2-int8 model to other floating format - I’m aware that there might be some precision issues; but the goal is just to have it converted? If this conversion is not supported yet, which part of the code should I look at to make it happen?

Thank you for your help.

dmarin · February 6, 2024, 8:49pm

Hi,

Have you tried to let CT2 to convert the type on load?

https://opennmt.net/CTranslate2/quantization.html#implicit-type-conversion-on-load

The doc also mentions this:

Conversions between all types are supported. For example, you can convert a model with quantization="int8" and then execute in full precision with compute_type="float32" .

noclever · February 7, 2024, 10:51am

Hi,

Thanks. Yes we’ve tried that and it worked.

However, meanwhile we are more interested in an offline solution that does not require the conversion on load, which will help us to avoid the runtime quantization/dequantization completely - our hardware doesn’t allow quick quantization/dequantization.