What is the use of Monolingual Corpora in SMT

nilu · November 30, 2017, 7:50am

I have a problem to understand the Monolingual Corpora, Particularly I dont understand its use in SMT, please help me to clarify my query.

bsbor · November 30, 2017, 11:17am

Hello,

In general, we use Monolingual corpora to create synthetic data.
You can do that by :

Taking a translation model
Translate your monolingual corpora with it
Use the translated monolingual corpora in a new translation model

Bsbor

huache · December 6, 2017, 3:12am

The SMT model is composed of two part mainly: a language model and a translation model.

So, the monolingual Corpora is used to train the (target) language model, and the bilingual Corpora is for translation model training.

The language model used to evaluate the translation fluent or not, the translation model used to keep the translation precision. You can follow this link to know more detail: http://www.statmt.org/.

The Statistical MT Handbook on that site is strongly recommend to read.