Hallo,
I am trying to do some transfer learning for low resource language pairs. Let’s say we have lot’s of data for FR-EN and EN-DE but none for FR-DE. We do not want to use a pivot. Is there a way to initialize the decoder with pretrained weights (for example, trained on (EN-DE), let’s say a transformer model) and also initialize the encoder with pretrained weights from another transformer model(FR-EN) or just leave the weight initialization at random?