OpenNMT Forum

Multilingual training experiments

We performed some interesting multilingual training with OpenNMT-py.

This first experiment includes 5 languages, 20 pairs (English, French, Italian, German, Spanish).

We used a “medium transformer” (6x768) and here are some example results:

Newstest13 (DE to FR):
Single pair model: 30.55
Google T: 28.25
Multilingual model: 30.40

Newstest19 (DE to FR):
Single pair model: 35.21
Google T: 32.18
Multilingual model: 34.60

Newstest14 (FR to EN):
Single pair model: 41.3
Google T: 38.79
Multilingual model: 39.0

It performs quite well, it’s always above the pivot via EN when EN is not in the pair.

Next step is to try with more languages.

1 Like

Hi Vince,
Is this performed by adding src and tgt language token similar to how it’s done here? Can you share additional detail about the pre-processing and training procedure?



Language token
We prepend a special token to each source, flagging the language of the target only (⦅_tgt_is_XX_⦆).

We use BPE (48k merge operations) learned on an aggregation of samples of the different languages and corpora.

Nothing special here, Transformer “medium” configuration (6 layers, 768 dim, 3072 ff), with shared encoder/decoder/generator parameters. Trained on 6 GPUs in FP16 mode, batches of approx. 50k tokens, results kept improving beyond 500k steps.

1 Like