Hi there
Sorry for this beginner question, but I’m not sure how to determine the progress of the training.
I’m training a corpus of about 400.000 sentences using Tensorflow OpenNMT
I’m on Windows on non-gpu hardware.
My questions are:
- When will training complete?
- Is it ok to test during training?
- Why are tests almost only showing useless output?
Training parameters:
python -m bin.main train_and_eval --model config/models/nmt_small.py --config config/opennmt-defaults.yml config/data/xxx.yml
I have created the vocabs for src and tgt.
I have not modified the default config files, except point to other files.
When I test, I use:
python -m bin.main infer --config config/opennmt-defaults.yml config/data/xxx.yml --features_file data/xxx/src-test.txt
The output is 99% <unk> and </s> , so completely useless.
But it’s while it’s still training.
Training has been for 4 days.
Log tail from the current training:
INFO:tensorflow:Loss for final step: 60.82655.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-04-02-04:44:31
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from xxx\model.ckpt-39393
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-04-02-04:47:42
INFO:tensorflow:Saving dict for global step 39393: global_step = 39393, loss = 4
.2886634
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: [‘serving_default’]
INFO:tensorflow:Restoring parameters from xxx\model.ckpt-39393
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: b"xxx\export\latest\temp-b’1522644463’
\assets"
INFO:tensorflow:SavedModel written to: b"xxx\export\latest\temp-b’1522644
463’\saved_model.pb"
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Number of trainable parameters: 87083345
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from xxx\model.ckpt-39393
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 39394 into xxx\model.ckpt.
INFO:tensorflow:loss = 49.204033, step = 39394
INFO:tensorflow:global_step/sec: 0.119733
INFO:tensorflow:words_per_sec/features: 122.171
INFO:tensorflow:words_per_sec/labels: 120.3
INFO:tensorflow:global_step/sec: 0.134033
INFO:tensorflow:loss = 48.57039, step = 39494 (790.641 sec)
INFO:tensorflow:words_per_sec/features: 134.854
INFO:tensorflow:words_per_sec/labels: 132.298
INFO:tensorflow:global_step/sec: 0.11271
INFO:tensorflow:words_per_sec/features: 111.115
INFO:tensorflow:words_per_sec/labels: 108.798
INFO:tensorflow:global_step/sec: 0.112425
INFO:tensorflow:loss = 63.985336, step = 39594 (888.359 sec)
INFO:tensorflow:words_per_sec/features: 120.994
INFO:tensorflow:words_per_sec/labels: 116.269
INFO:tensorflow:global_step/sec: 0.133264
INFO:tensorflow:words_per_sec/features: 134.877
INFO:tensorflow:words_per_sec/labels: 134.162
INFO:tensorflow:global_step/sec: 0.12646
INFO:tensorflow:loss = 51.120525, step = 39694 (770.595 sec)
INFO:tensorflow:words_per_sec/features: 134.063
INFO:tensorflow:words_per_sec/labels: 130.707
INFO:tensorflow:global_step/sec: 0.131255
INFO:tensorflow:words_per_sec/features: 136.23
INFO:tensorflow:words_per_sec/labels: 132.752
INFO:tensorflow:global_step/sec: 0.131476
INFO:tensorflow:loss = 77.925156, step = 39794 (761.219 sec)
INFO:tensorflow:words_per_sec/features: 132.579
INFO:tensorflow:words_per_sec/labels: 131.904
INFO:tensorflow:global_step/sec: 0.1223
INFO:tensorflow:words_per_sec/features: 126.435
INFO:tensorflow:words_per_sec/labels: 123.761
INFO:tensorflow:global_step/sec: 0.128446
INFO:tensorflow:loss = 59.011623, step = 39894 (798.097 sec)
INFO:tensorflow:words_per_sec/features: 136.087
INFO:tensorflow:words_per_sec/labels: 132.545
INFO:tensorflow:global_step/sec: 0.101857
INFO:tensorflow:words_per_sec/features: 108.341
INFO:tensorflow:words_per_sec/labels: 107.155
INFO:tensorflow:global_step/sec: 0.112868
INFO:tensorflow:loss = 66.56572, step = 39994 (933.880 sec)
INFO:tensorflow:words_per_sec/features: 116.497
INFO:tensorflow:words_per_sec/labels: 114.39
INFO:tensorflow:global_step/sec: 0.113701
INFO:tensorflow:words_per_sec/features: 123.973
INFO:tensorflow:words_per_sec/labels: 122.798
INFO:tensorflow:global_step/sec: 0.124485
INFO:tensorflow:loss = 97.666565, step = 40094 (841.419 sec)
INFO:tensorflow:words_per_sec/features: 127.033
INFO:tensorflow:words_per_sec/labels: 125.911
INFO:tensorflow:global_step/sec: 0.125362
INFO:tensorflow:words_per_sec/features: 127.508
INFO:tensorflow:words_per_sec/labels: 126.811
INFO:tensorflow:global_step/sec: 0.117176
INFO:tensorflow:loss = 40.40155, step = 40194 (825.538 sec)
INFO:tensorflow:words_per_sec/features: 122.729
INFO:tensorflow:words_per_sec/labels: 118.677
INFO:tensorflow:global_step/sec: 0.128379
INFO:tensorflow:words_per_sec/features: 134.496
INFO:tensorflow:words_per_sec/labels: 130.503
INFO:tensorflow:global_step/sec: 0.106624
INFO:tensorflow:loss = 101.765274, step = 40294 (858.407 sec)
INFO:tensorflow:words_per_sec/features: 113.769
INFO:tensorflow:words_per_sec/labels: 109.035
INFO:tensorflow:global_step/sec: 0.125137
INFO:tensorflow:words_per_sec/features: 132.143
INFO:tensorflow:words_per_sec/labels: 130.938
INFO:tensorflow:Saving checkpoints for 40394 into xxx\model.ckpt.
INFO:tensorflow:global_step/sec: 0.113762
INFO:tensorflow:loss = 86.29198, step = 40394 (839.079 sec)
INFO:tensorflow:words_per_sec/features: 113.935
INFO:tensorflow:words_per_sec/labels: 113.077
INFO:tensorflow:global_step/sec: 0.128195
INFO:tensorflow:words_per_sec/features: 132.538
INFO:tensorflow:words_per_sec/labels: 131.333
INFO:tensorflow:global_step/sec: 0.108179
INFO:tensorflow:loss = 47.689987, step = 40494 (852.229 sec)
INFO:tensorflow:words_per_sec/features: 119.852
INFO:tensorflow:words_per_sec/labels: 117.083
INFO:tensorflow:global_step/sec: 0.125795
INFO:tensorflow:words_per_sec/features: 131.947
INFO:tensorflow:words_per_sec/labels: 131.202
INFO:tensorflow:global_step/sec: 0.114199
INFO:tensorflow:loss = 48.773567, step = 40594 (835.303 sec)
INFO:tensorflow:words_per_sec/features: 115.668
INFO:tensorflow:words_per_sec/labels: 114.811
Thank you in advance.
Best
Peter