panosk
(Panos Kanavos)
February 10, 2019, 6:01pm
1
Hello,
While training with the following command:
CUDA_VISIBLE_DEVICES=0 nohup onmt-main train_and_eval --model_type TransformerFP16 --config config/main_v2.yml --auto_config &> foobar.log &
after evaluation at 1000 steps (although in the main_v2.yml
it is defined that eval_delay: 3600
and at this point:
INFO:tensorflow:Loading best metric from event files.
I get a traceback with the following error:
ValueError: best_eval_result cannot be empty or no loss is found in it.
Any help will be greatly appreciated.
Hi,
What is your TensorFlow version? I remember seeing this issue on their repo at some point. You should try one of the following:
Update TensorFlow
Disable or change the model exporter to “last” or “final”:
eval:
exporters: last
1 Like
panosk
(Panos Kanavos)
February 11, 2019, 8:45am
3
Hi @guillaumekln ,
I’m using the latest TensorFlow version, 1.12. I will change the exporter and let you know. Training stops after each evaluation and I have to continue it manually, so I will report back soon if the change in the .yml file works.
Thanks!
panosk
(Panos Kanavos)
February 11, 2019, 9:13am
4
I confirm that after adding exporters: last
in my .yml file, training no longer crashes.
Thanks a lot @guillaumekln !