Adaptive Machine Translation with Large Language Models

New paper… Your feedback is appreciated! Thanks!


Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, real-time adaptation remains challenging. Large-scale language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. By feeding an LLM with a prompt that consists of a list of translation pairs, it can then simulate the domain and style characteristics at inference time. This work aims to investigate how we can utilize in-context learning to improve real-time adaptive MT. Our extensive experiments show promising results at translation time. For example, GPT-3.5 can adapt to a set of in-domain sentence pairs and/or terminology while translating a new sentence. We observe that the translation quality with few-shot in-context learning can surpass that of strong encoder-decoder MT systems, especially for high-resource languages. Moreover, we investigate whether we can combine MT from strong encoder-decoder models with fuzzy matches, which can further improve the translation, especially for less supported languages. We conduct our experiments across five diverse languages, namely English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French (EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES) language pairs.


looks nice!

not sure if everyone is willing to perform some GPT3 queries while translating, but good looking.

FYI we have this pending PR here: Add fuzzymatching transform for Neural Fuzzy Repair by panosk · Pull Request #2302 · OpenNMT/OpenNMT-py · GitHub

the idea is similar, plus we will also post a terminology transform in the next few weeks.

altogether will give similar positive results.


Many thanks, Vincent! These are exciting news.

The approach offered by Bulte and Tezcan (2019) and improved later by Xu et al. (2020) should work well. The main limitation I see is the length of some long sentences when concatenating a fuzzy match, let alone a couple of fuzzy matches. This can hit the max token length of encoder-decoder models quickly. Nevertheless, it is definitely a good addition to the current architecture.

Wow, that is wonderful news. Many thanks!

Kind regards,

Hi Yasmin,

The length is taken care in the transform already. There are min and max limits for the sentences that will be augmented with fuzzies. These limits are there to improve the fuzzy matching performance, but as a side effect they also help with the max token lengths so there won’t be huge examples that will be filtered out. Also, the transform uses only a single match and not multiple, as in the paper. I was using 2 matches in the past, but I found it a pain compared to the improvement it offered. I may add this option for augmenting with multiple matches some time in the future though.
In any case, this augmentation definitely requires to increase the max token length a bit (250-300 for the source side should be fine).


1 Like

Wonderful effort. Many thanks, Panos!

Hi there,

I happened to across this discussion, and I think I know a bit about the topic to add something.

Concatenating multiple fuzzy matches for a sentence is indeed a huge disadvantage for this approach. Specifically, it is not only that a concatenated sentence can be easily longer than a has-to-be-pre-defined max len from Transformer, but it also makes the Transformer harder to model all the interactions between the sentence and its fuzzy matches. After all you are about to concatenate all sentences together and ask Transformer to jointly learn all of their interactions, which must be very complex when it comes to having several fuzzy matches at the same time.

Instead, we found it significantly better if we modify Transformer so that it controls source and fuzzy match interactions. We did not obtain much faster latency (yet) with our model architecture, but at least for translation accuracy we found a significantly better BLEU score for many zero-shot MT settings we tried. For having several fuzzy matches, I think this is the right way to go. Details are all here from our EACL 2023 (Findings) paper - Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions.


Hi @hoangcuong2011 ,
Thanks for your input and the interesting paper. Since we are into it, and we have hijacked @ymoslem 's topic a bit (we are not completely off topic, but still…), I’d like to bring up an issue I have noticed after using Neural Fuzzy Repair for a long time (and @guillaumekln has also confirmed in a discussion we had few months ago): in some cases, the model will just copy the fuzzy match to its output unmodified, or it may fail to substitute a word or a couple of words. So, while the BLEU score “explodes” when evaluating a model adapted with fuzzy matches, these slight misses are a serious and misleading flaw that require the end-user to be extremely cautious. Are you aware of this issue and, if so, have you noticed a better behavior with your method?

Hi @panosk,
I think I observed something similar - that augmented MT may copy fuzzy words unreasonably sometimes. We thus also developed a simple way to shuffle fuzzy matches during training. This helps mitigate this phenomenon. Maybe you can checkout our paper:

Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions [2210.05059] Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions

Interesting, thanks!