I would be very interested to try to follow along the “800 words / second on the CPU” paper (https://www.aclweb.org/anthology/W18-2715/). I found the CTranslate source which I understand the paper uses for executing the distilled models, but I haven’t been able to find the distillation bits themselves. Are they publicly available?
(The context is, of course, https://github.com/lernapparat/lotranslate/ for which a somewhat speedy CPU inference would be awesome to have).
Best regards & thank you