Nart
(Nart Tlisha)
July 8, 2020, 10:56am
1
Hello,
I get a different translation result when using a checkpoint and it’s converted ctranslate2 model,
The inference from the checkpoint:
onmt-main --config data.yml --auto_config --checkpoint_path run/ckpt-5000 infer --features_file src-val.txt --predictions_file pred.txt
Converting the checkpoint to a ctranslate2 model:
ct2-opennmt-tf-converter --model_path model/ --model_spec TransformerBaseRelative --output_dir model_ctranslate2 --src_vocab src-vocab.txt --tgt_vocab tgt-vocab.txt --quantization int16
The inference from the ctranslate2 model:
translator = ctranslate2.Translator(app.root_path + “/ctranslate_model”)
text_list = translator.translate_batch(source_list)
Translation result:
source sentence:
▁афинал ▁мшаԥымза ▁анҵәамҭазы ▁имҩаԥысраны ▁иҟоуп ▁.
Target result with checkpoint inference:
▁в ▁конце ▁апреля ▁пройдет ▁финал ▁.
Target result with ctranslate2 inference:
▁несов мести ла ▁друг ▁мел ью ▁мел кий ▁и ▁пропове емо ▁, ▁заня ла ▁благой ▁вестью ▁, ▁заня ла ▁также ▁самим ▁, ▁заня ющим ▁благой ▁вестью ▁и ▁деньги ▁, ▁заня ла ▁выпуски ▁деньги ▁и ▁деньги ▁, ▁заня ла ▁для ▁деньги
What am I missing?!
Can you try exporting from OpenNMT-tf?
onmt-main --config data.yml --auto_config --checkpoint_path run/ckpt-5000 export --export_dir model_ctranslate2 --export_format ctranslate2
First, make sure to use latest version of ctranslate2
.
Are you sure you are comparing the same model? Are all translation results different?
Nart
(Nart Tlisha)
July 8, 2020, 12:30pm
5
I will double check and let you know.
Nart
(Nart Tlisha)
July 9, 2020, 9:57am
6
Please, disregard the previous result.
This time I used 15k step checkpoint, and 15k step ctranslate2 model, I translated 50 sentences, even though the tokenized BLEU score is close:
BLEU+tok+checkpoint = 22.92, 53.4/31.8/22.9/17.4 (BP=0.800, ratio=0.817, hyp_len=1426, ref_len=1745)
BLEU+tok+ctranslate2 = 22.76, 48.8/27.4/20.1/16.2 (BP=0.886, ratio=0.892, hyp_len=1556, ref_len=1745)
There are differences in translation that brings some concern to me.
Should I be expecting such differences?
Here is a link to reproduce the results: 15k model and test data
Is this with int16 quantization as used in the first post? If yes, differences are to be expected.
Also make sure to use the same beam size, translator.translate_batch(source_list, beam_size=4)
.
Nart
(Nart Tlisha)
July 9, 2020, 10:33am
9
I added beam_size=4, looks like it got worse.
BLEU+tok+checkpoint = 22.92, 53.4/31.8/22.9/17.4 (BP=0.800, ratio=0.817, hyp_len=1426, ref_len=1745)
BLEU+tok+ctranslate2 = 20.37, 49.4/27.6/19.7/15.2 (BP=0.806, ratio=0.823, hyp_len=1436, ref_len=1745)
That’s strange. What parameters did you use in OpenNMT-tf besides --auto_config
?
Nart
(Nart Tlisha)
July 9, 2020, 10:46am
11
These are the parameters I am using during the conversion to ctranslate2:
INFO:tensorflow:Using parameters:
data:
eval_features_file: src-val.txt
eval_labels_file: tgt-val.txt
source_vocabulary: src-vocab.txt
target_vocabulary: tgt-vocab.txt
train_features_file: src-train.txt
train_labels_file: tgt-train.txt
eval:
batch_size: 32
batch_type: examples
length_bucket_width: 5
infer:
batch_size: 32
batch_type: examples
length_bucket_width: 5
model_dir: run/
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: NoamDecay
label_smoothing: 0.1
learning_rate: 2.0
num_hypotheses: 1
optimizer: LazyAdam
optimizer_params:
beta_1: 0.9
beta_2: 0.998
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 3072
batch_type: tokens
effective_batch_size: 25000
keep_checkpoint_max: 8
length_bucket_width: 1
max_step: 500000
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_summary_steps: 100