Pre-processing corpora

What is the exact translate.lua command you ran?

    th translate.lua -src  ~/datasets/EN-ES/corpus/${experiment_}/tatoeba.en-es.tok.short.en \
                     -tgt  ~/datasets/EN-ES/corpus/${experiment_}/tatoeba.en-es.tok.short.es \
                     -detokenize_output true \
                     -tok_tgt_joiner_annotate true \
                     -output ${path}/pred.tok.${filename}.txt \
                     -model ${path}/${filename}\
                     -tok_tgt_case_feature true \
                     -gpuid 1

If you pass -tgt, target tokenization options are also applied on this file.

The simplest way is just to not pass this file for inference. Otherwise you should pass the non tokenized version and set all required target tokenization options.

ok, it works. Thanks a lot!