Model export by averaged checkpoints

Can you please tell me if there is a possibility in OpenNMT-tf to average checkpoints during training (3 passed, 3 averaged, and so on), evaluate results and export the model by the best Bleu, taking into account the best values of averaged checkpoints?

If not, then how can I export the bin model by the last averaged checkpoints? The export by best Bleu is done automatically:

2023-04-05 21:20 - Info - Step = 43000 ; steps/s = 1.23, tokens/s = 68337 (32753 source, 35584 target) ; Learning rate = 0.000426 ; Loss = 2.251879
2023-04-05 21:27 - Info - Step = 43500 ; steps/s = 1.23, tokens/s = 68428 (32797 source, 35631 target) ; Learning rate = 0.000424 ; Loss = 2.264318
2023-04-05 21:33 - Info - Step = 44000 ; steps/s = 1.23, tokens/s = 68390 (32781 source, 35609 target) ; Learning rate = 0.000421 ; Loss = 2.248560
2023-04-05 21:40 - Info - Step = 44500 ; steps/s = 1.23, tokens/s = 68415 (32798 source, 35617 target) ; Learning rate = 0.000419 ; Loss = 2.247980
2023-04-05 21:47 - Info - Step = 45000 ; steps/s = 1.23, tokens/s = 68369 (32768 source, 35601 target) ; Learning rate = 0.000417 ; Loss = 2.261137
2023-04-05 21:47 - Info - Saved checkpoint /app/datastudio_prototype/media/models/en_ru_gg_val/result/ckpt-45000
2023-04-05 21:47 - Info - Running evaluation for step 45000
2023-04-05 21:47 - Info - Evaluation predictions saved to /app/datastudio_prototype/media/models/en_ru_gg_val/result/eval/predictions.txt.45000
2023-04-05 21:48 - Info - Evaluation result for step 45000: loss = 0.477002 ; perplexity = 1.611237 ; bleu = 72.355054
2023-04-05 21:48 - Info - Exporting model to /app/datastudio_prototype/media/models/en_ru_gg_val/result/export/45000 (best bleu so far: 72.355054)

But after averaging the checkpoints at the end of the training, the model is not exported. Is it possible to export it automatically?

2023-04-05 22:58 - Info - Restored checkpoint /app/datastudio_prototype/media/models/en_ru_gg_val/result/ckpt-50000
2023-04-05 22:58 - Info - Averaging 3 checkpoints…
2023-04-05 22:58 - Info - Reading checkpoint /app/datastudio_prototype/media/models/en_ru_gg_val/result/ckpt-45000…
2023-04-05 22:58 - Info - Reading checkpoint /app/datastudio_prototype/media/models/en_ru_gg_val/result/ckpt-47500…
2023-04-05 22:58 - Info - Reading checkpoint /app/datastudio_prototype/media/models/en_ru_gg_val/result/ckpt-50000…
2023-04-05 22:58 - Info - Saved averaged checkpoint to /app/datastudio_prototype/media/models/en_ru_gg_val/result/avg/ckpt-50000
2023-04-05 22:58 - Info - finish trainee task

Thank you.

Currently there is no way to do that in a single training process (meaning with a single call to onmt-main ... train --with_eval).

However, you could define a script that runs each step separately and in a loop, for example run onmt-main ... train for a few iterations, and then run onmt-main --checkpoint_path /path/to/avg/ ... eval ....

I’m not sure everything will run smoothly regarding the model export, but that’s the general idea you could try.

Thank you for your reply. Can you tell me more about exporting the averaged model? As far as I understand, OpenNMT-tf only saves the averaged checkpoints at the end of training, but does not export them to the model. How can I do that? What command?
Thank you.

The process I described in my previous post should do that. When the train command ends, the checkpoint is averaged. Then we can run eval on this averaged checkpoint. If you configured export_on_best in the configuration, the eval command should also export the averaged checkpoint accordingly.