Training LM with monolingual corpora


(Claudia) #1

Hello everyone
I want to train a LM but I get an error and I don’t understand what does it means.

First I preprocessed the corpora with this command
sudo nvidia-docker run -v $PWD/opennmt_data/:/home/data -d claudia_opennmt th preprocess.lua -data_type monotext -train /home/data/ -valid /home/data/tgt-val-twe-nl.txt -save_data /home/data/datalm

This is the command I am running for training
sudo nvidia-docker run -v $PWD/opennmt_data/:/home/data -d claudia_opennmt th train.lua -model_type lm -data /home/data/datalm-train.t7 -save_model /home/data/demo-lm

And this is the error I get

Can anyone help me?

(Guillaume Klein) #2


This is an open issue, see:

(Claudia) #3

I see. Thanks!

(Claudia) #4


Is there a way of doing the same (training a neural language model using monolingual data) with the tensor flow implementation of OpenNMT? I’ve been reading the documentation but I couldn’t find anything similar.


(Guillaume Klein) #5


No there is no such feature.

(Claudia) #6

Oh I see, thanks!