Hi! I trained translation model to grammatical error correction in English. Source is the sentence with error (or even correct sentence) and target is corrected sentence. I trained a model without using bpe. The problem is that, when I make translations with unk word replacement, the unknown word is replaced by the wrong word (specifically, is the word or punctuation sign which follows the unknown word).
e.g.
hyp: These days, my idea of a safe place to keep money is a Full full o’Nuts coffee can.
trg: These days, my idea of a safe place to keep money is a Chock full o’Nuts coffee can.
hyp: Carved, stepping, – – you name it, he did it.
trg: Carving, stepping, cuttys – you name it, he did it.
The script I use to train and translate is the following:
th tools/tokenize.lua
-case_feature
-nparallel 8
-joiner_annotate true
-segment_numbers true < $set > ${set}.tok
th preprocess.lua
-train_srctraining.tok.${src_}
-train_tgt training.tok.${trg_}
-valid_src dev.tok.${src_}
-valid_tgt dev.tok.${trg_}
-save_data preprocessed
-sort true
-report_progress_every 100000 \
th train.lua -data preprocessed-train.t7
-rnn_size 128
-encoder_type rnn
-rnn_type LSTM
-end_epoch 60
-max_batch_size 50
-save_model models/
-layers 2
-dropout 0.3
-optim adam
-learning_rate 0.0002
-learning_rate_decay 1.0
-src_word_vec_size 128
-tgt_word_vec_size 128
-gpuid 1
th translate.lua
-src ${test_}
-detokenize_output true
-tok_tgt_joiner_annotate true
-output models/pred.tok.${f}.txt
-model /models/${f}
-tok_tgt_case_feature true
-replace_unk
-gpuid 1
Have you any idea what I’m doing wrong?
Thanks in advance