What is the use of Monolingual Corpora in SMT

I have a problem to understand the Monolingual Corpora, Particularly I dont understand its use in SMT, please help me to clarify my query.


In general, we use Monolingual corpora to create synthetic data.
You can do that by :

  • Taking a translation model

  • Translate your monolingual corpora with it

  • Use the translated monolingual corpora in a new translation model


The SMT model is composed of two part mainly: a language model and a translation model.

So, the monolingual Corpora is used to train the (target) language model, and the bilingual Corpora is for translation model training.

The language model used to evaluate the translation fluent or not, the translation model used to keep the translation precision. You can follow this link to know more detail: http://www.statmt.org/.

The Statistical MT Handbook on that site is strongly recommend to read.

1 Like