Good friends, all of you,
I am currently trying to use OpenNMT to translate English into Chinese, but there are many problems.
Chinese corpus 2000 sentence, Chinese participle(for example :今年 是 维多利亚 的 秘密 时尚 秀 10 周年纪念 而 如今 它 却 有着 20 亿 的 观众 收看 还有 模特 的 数量),The 2000 sentence of the sample of English Corpus(for example :It is the 10th anniversary of the Victoria ’ s Secret Fashion Show . but now it ’ s been watched by more than 2 billion people the numbers of models we got .)Corresponding Chinese Corpus,Two verification files 200 sentences from Chinese corpus sample and English Corpus example, respectively.Then according to OpenNMT tutorial, preprocessing > Training Model > translation.Finally, we extract 30 data from Chinese corpus sample to test data.The effect is very bad,Only a few single words can be identified,Repeating words,Thank you for all kinds of guidance and ideas
Torch >>>>> translation journal:
[04/03/18 10:05:33 INFO] SENT 232: 浣??搴 浣姣 ?浜瑙?濂?
[04/03/18 10:05:33 INFO] PRED 232: I watched watched watched .
[04/03/18 10:05:33 INFO] PRED SCORE: -10.66
[04/03/18 10:05:33 INFO]
[04/03/18 10:05:33 INFO] Translated 1759 words, src unk count: 232, coverage: 13.1%, tgt words: 1373 words, tgt unk count: 0, coverage: 0%,
[04/03/18 10:05:33 INFO] PRED AVG SCORE: -2.30, PRED PPL: 9.94
PyTorch >>>>> translation journal:
root@iZj6c3p7zpoi634e71k0cuZ:/workspace/OpenNMT-py# python translate.py -model …/zh-cn-py/myfile-model_acc_14.46_ppl_256.20_e6.pt -src data/src-test.txt -outpu
t pred.txt -replace_unk -verbose
You are using pip version 9.0.1, however version 9.0.3 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
You are using pip version 9.0.1, however version 9.0.3 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.
Loading model parameters.
average src size 8.666666666666666 9/workspace/OpenNMT-py/onmt/modules/GlobalAttention.py:176: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
align_vectors = self.sm(align.view(batch*targetL, sourceL))/root/python/lib/python3.6/site-packages/torch/nn/modules/container.py:67: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
SENT 5: ('浣?, ‘寤’, '彖?, ‘浠’, ‘锛’, '浣?, ‘珑’, ‘浠’, '姝?, '?
')PRED 5: I watched watched watched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PRED SCORE: -3.2823
PRED AVG SCORE: -0.5602, PRED PPL: 1.7510
Traceback (most recent call last):
File “translate.py”, line 152, in
main()
File “translate.py”, line 137, in main
_report_score(‘PRED’, pred_score_total, pred_words_total)
File “translate.py”, line 30, in _report_score
name, score_total / words_total,
ZeroDivisionError: float division by zero