Does ctranslate2 support DeltaLM models?
DeltaLM has encoder-decoder architecture but, it has ffn between self-attention and cross-attention.
is there a plan to add deltaLM support to ctranslate2?
This model architecture exists since 2021 and it’s the first time I heard about it. Is it frequently used?