Distillation Code


I would be very interested to try to follow along the “800 words / second on the CPU” paper (https://www.aclweb.org/anthology/W18-2715/). I found the CTranslate source which I understand the paper uses for executing the distilled models, but I haven’t been able to find the distillation bits themselves. Are they publicly available?

(The context is, of course, https://github.com/lernapparat/lotranslate/ for which a somewhat speedy CPU inference would be awesome to have).

Best regards & thank you



Distillation in this context is just about training a smaller model on the output of a bigger model. It does not require any specific code.

Also, as mentioned here:

We plan to release a new CTranslate version that will be compatible with both OpenNMT-py and OpenNMT-tf (and faster than both).

1 Like

Thank you for the update!
So I’ll try with the small model as global_attention mlp and the hyperparameters of distill-small.

Best regards