Preprocess.lua Empty Dataset Error

(Miguel Domingo) #1

While preprocessing, I’ve got the following error:

empty dataset (./onmt/data/Preprocessor.lua:953: in function ‘makeGenericData’ / ./onmt/data/Preprocessor.lua:1018: in function ‘makeBilingualData’ / ./onmt/data/Preprocessor.lua:1198: in function ‘makeData’ / preprocess.lua:84: in function ‘main’ / preprocess.lua:124: in main chunk)

The command I used was: th preprocess.lua -train_src source -train_tgt target -save_data data

The files content is as follows (I’m trying to preprocess a single sentence):

Source: on│L the│L humanitarian│L aid│L front│L ,│N the│L commission│C has│L been│L present│L in│L all│L the│L recent│L crises│L -│N in│L the│L african│C great│C lakes│C ,│N former│L yugoslavia│C ,│N cambodia│C ,│N north│C korea│C ,│N the│L cis│U countries│L and│L the│L middle│C east│C .│N

Target: en│L materia│L de│L ayuda│L humanitaria│L ,│N la│L comisión│C ha│L estado│L presente│L en│L todas│L las│L crisis│L :│N en│L la│L región│L de│L los│L grandes│C lagos│C en│L áfrica│C ,│N en│L la│L ex│L yugoslavia│C ,│N en│L camboya│C ,│N en│L corea│C del│L norte│C ,│N en│L los│L países│L de│L la│L cei│U ,│N en│L el│L cercano│C y│L medio│C oriente│C .│N

I’ve tried preprocessing single sentences without any problems, but this particular one seems to fail and I don’t understand why (there are no empty lines in the files, just the its correspondent source/target sentence). Could you help me identify the error?

Thanks in advance,


(Wiktor Stribiżew) #2

If you defined variables source, target and data, you should refer to them using $+variable:

th preprocess.lua -train_src "$source" -train_tgt "$target" -save_data "$data"

(Miguel Domingo) #3

Those were the files’ names (simplifying the path for the example) not variables, but thanks for the advice.