Due to hardware limits we can no longer use ctranslate2-int8 on our platform; meanwhile since the original checkpoints are gone, we can’t easily make other conversions (we are using opennmt-tf btw if it matters). May I ask if it’s possible to convert ctranslate2-int8 model to other floating format - I’m aware that there might be some precision issues; but the goal is just to have it converted? If this conversion is not supported yet, which part of the code should I look at to make it happen?
Thank you for your help.
Have you tried to let CT2 to convert the type on load?
The doc also mentions this:
Conversions between all types are supported. For example, you can convert a model with
quantization="int8" and then execute in full precision with
Thanks. Yes we’ve tried that and it worked.
However, meanwhile we are more interested in an offline solution that does not require the conversion on load, which will help us to avoid the runtime quantization/dequantization completely - our hardware doesn’t allow quick quantization/dequantization.