Due to hardware limits we can no longer use ctranslate2-int8 on our platform; meanwhile since the original checkpoints are gone, we can’t easily make other conversions (we are using opennmt-tf btw if it matters). May I ask if it’s possible to convert ctranslate2-int8 model to other floating format - I’m aware that there might be some precision issues; but the goal is just to have it converted? If this conversion is not supported yet, which part of the code should I look at to make it happen?
Conversions between all types are supported. For example, you can convert a model with quantization="int8" and then execute in full precision with compute_type="float32" .
However, meanwhile we are more interested in an offline solution that does not require the conversion on load, which will help us to avoid the runtime quantization/dequantization completely - our hardware doesn’t allow quick quantization/dequantization.