We’d like to contribute our codes about a sentence-level MoE structure to CTranslate2.
Compared with original Transformer, the structure introduces a gating network and several experts in the decoder layers, where the gating network will route data to the most suitable expert and the experts are used to fine-tune decoder features (e.g. fine-tune the decoder features from generalization to domain specialization).
The experts can also be seen as the adapters mentioned in this issue.
After testing our codes with the processes in CONTRIBUTING.md, we have some questions:
Is there a requirement for coding style (such as naming rules for variables)?
Is it welcomed to separate the updated code from the original code as much as possible?
Since our model is trained with Fairseq, is it necessary to contribute the training code to Fairseq?
Sorry, I saw your post but forget to come back to it after the weekend.
So you trained a custom Fairseq model, updated the CTranslate2 code to support it, and now you want to contribute the changes to the official repository.
In general I don’t accept this type of contributions because I can’t spend time to maintain some code that can only be used by a single organization or individual. There could be exceptions for small code changes, but here it seems to me that the code change is quite large.
Contributing the code to Fairseq would indeed be a first step towards integrating the changes in CTranslate2.
Alternatively, I recently worked to make the core library more extensible. Now you can define your own model specification and then register the related C++ model instance at runtime: