Hi Vince,
Is this performed by adding src and tgt language token similar to how it’s done here? Can you share additional detail about the pre-processing and training procedure?
More info:
on EN-DE:
At the end of the multilingual training: NT14 EN-DE: 28.6
30K finetuning steps on EN-DE data only => 31.65
SOA EN-DE one pair Model: 32.64
same results on NT18:
43.5 multi => finetuning 45.9 (reference SOA single pair 47.8)
EN-DE was the pair for which the gap between multi and single pair was the biggest.
yes and no.
It is based on Multiparacrawl and Europarl.
They do not contain the same number of segments but we used weights to have a weight of 2 for Multiparacrawl vs 1 for Europarl.
Also we included some back translation for some pairs and not for others to measure the impact. Weight for BT was 1 (same as Europarl).
Did you try a deeper model to increase the capacity? It should be very beneficial for multilingual NMT. I am testing product keys with a multilingual and multi-domain model, but at the current state a deeper model is the way to go.