I’m unable to start fine-tuning due to a restoring from checkpoint failure. The error I get is:
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
This looks similar to this issue (https://github.com/OpenNMT/OpenNMT-tf/issues/269) but the discussion has not helped me (error persists whether I do num_gpus 3
or num_gpus 1
or leave the argument out completely, and my environment has not changed.
What I’ve done so far
- Trained a general domain English-French Transformer until good, stable performance was reached
- Did
spm_encode --generate_vocabulary
on my new domain English and French files to get the new vocabularies - Tidied these new vocabularies (took first column using
cut -f 1
, inserted a<blank>
as first line of both vocabulary files) - Ran an
onmt-update-vocab
command:
onmt-update-vocab --model experiments/fr_en_v2/avg/ \
--output_dir=experiments/fr_en_v2/avg/updated \
--src_vocab=enfr.vocab \
--tgt_vocab=enfr.vocab \
--new_src_vocab=new.fr_en.vocab.en \
--new_tgt_vocab=new.fr_en.vocab.fr
- Applied previously-trained SentencePiece model to tokenize the new train/dev/test data
trained SentencePiece model to in-domain train, dev and test data e.g. for the training files:
spm_encode --model=enfr.model < new.en_fr.train.en > new.en_fr.train.en.token
spm_encode --model=enfr.model < new.en_fr.train.fr > new.en_fr.train.fr.token
- Made a new config file for the fine tuning experiment run:
model_dir: experiments/fr_en_v2/avg/updated
data:
train_features_file: new.en_fr.train.en.token
train_labels_file: new.en_fr.train.fr.token
eval_features_file: new.en_fr.dev.en.token
eval_labels_file: new.en_fr.dev.fr.token
source_words_vocabulary: new.fr_en.vocab.en
target_words_vocabulary: new.fr_en.vocab.fr
train:
save_checkpoints_steps: 1000
eval:
eval_delay: 60 # Every min
save_eval_predictions: True
external_evaluators: BLEU
infer:
batch_size: 32
- Attempted to kick off the fine tuning run using the command
CUDA_VISIBLE_DEVICES=0,1,2 onmt-main train_and_eval \
--model_type Transformer \
--config experiments/fr_en_v2/avg/updated/config.yml --auto_config \
--num_gpus 3
Full terminal output
The full terminal output and error message, for info, is:
CUDA_VISIBLE_DEVICES=0,1,2 onmt-main train_and_eval \
> --model_type Transformer \
> --config experiments/fr_en_v2/avg/updated/config.yml --auto_config \
> --num_gpus 3
/usr/local/lib/python3.5/dist-packages/opennmt/config.py:139: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
subconfig = yaml.load(config_file.read())
WARNING:tensorflow:You provided a model configuration but a checkpoint already exists. The model configuration must define the same model as the one used for the initial training. However, you can change non structural values like dropout.
INFO:tensorflow:Using parameters:
data:
eval_features_file: new.en_fr.dev.en.token
eval_labels_file: new.en_fr.dev.fr.token
source_words_vocabulary: new.fr_en.vocab.en
target_words_vocabulary: new.fr_en.vocab.fr
train_features_file: new.en_fr.train.en.token
train_labels_file: new.en_fr.train.fr.token
eval:
batch_size: 32
eval_delay: 60
exporters: last
external_evaluators: BLEU
save_eval_predictions: true
infer:
batch_size: 32
bucket_width: 5
model_dir: experiments/fr_en_v2/avg/updated
params:
average_loss_in_time: true
beam_width: 4
decay_params:
model_dim: 512
warmup_steps: 8000
decay_type: noam_decay_v2
label_smoothing: 0.1
learning_rate: 2.0
length_penalty: 0.6
optimizer: LazyAdamOptimizer
optimizer_params:
beta1: 0.9
beta2: 0.998
score:
batch_size: 64
train:
average_last_checkpoints: 8
batch_size: 3072
batch_type: tokens
bucket_width: 1
effective_batch_size: 25000
keep_checkpoint_max: 8
maximum_features_length: 100
maximum_labels_length: 100
sample_buffer_size: -1
save_checkpoints_steps: 500
save_summary_steps: 100
train_steps: 500000
INFO:tensorflow:Accumulate gradients of 3 iterations to reach effective batch size of 25000
2019-04-25 15:38:43.476633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-25 15:38:43.577205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 11.92GiB freeMemory: 11.82GiB
2019-04-25 15:38:43.628597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:02:00.0
totalMemory: 11.93GiB freeMemory: 11.82GiB
2019-04-25 15:38:43.680793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:03:00.0
totalMemory: 11.93GiB freeMemory: 10.80GiB
2019-04-25 15:38:43.681267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-04-25 15:38:44.591748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-25 15:38:44.591789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-04-25 15:38:44.591796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y
2019-04-25 15:38:44.591802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y
2019-04-25 15:38:44.591808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N
2019-04-25 15:38:44.593231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 11435 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)
2019-04-25 15:38:44.593524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 11436 MB memory) -> physical GPU (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
2019-04-25 15:38:44.593706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:2 with 10447 MB memory) -> physical GPU (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0, compute capability: 5.2)
INFO:tensorflow:Using config: {'_master': '', '_experimental_distribute': None, '_keep_checkpoint_max': 8, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_session_config': gpu_options {
}
allow_soft_placement: true
graph_options {
rewrite_options {
layout_optimizer: OFF
}
}
, '_service': None, '_save_checkpoints_secs': None, '_save_summary_steps': 100, '_tf_random_seed': None, '_device_fn': None, '_task_type': 'worker', '_task_id': 0, '_num_worker_replicas': 1, '_log_step_count_steps': 300, '_eval_distribute': None, '_train_distribute': None, '_evaluation_master': '', '_model_dir': 'experiments/fr_en_v2/avg/updated', '_protocol': None, '_num_ps_replicas': 0, '_save_checkpoints_steps': 500, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f7d9e0e2c50>}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 500 or save_checkpoints_secs None.
INFO:tensorflow:Training on 1315535 examples
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Number of trainable parameters: 77511625
INFO:tensorflow:Graph was finalized.
2019-04-25 15:39:29.283239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-04-25 15:39:29.283335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-25 15:39:29.283349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-04-25 15:39:29.283356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y
2019-04-25 15:39:29.283363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y
2019-04-25 15:39:29.283369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N
2019-04-25 15:39:29.284862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11435 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)
2019-04-25 15:39:29.285063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11436 MB memory) -> physical GPU (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
2019-04-25 15:39:29.285389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10447 MB memory) -> physical GPU (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0, compute capability: 5.2)
INFO:tensorflow:Restoring parameters from experiments/fr_en_v2/avg/updated/model.ckpt-270000
2019-04-25 15:39:30.753455: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Invalid argument: tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
[[{{node save/RestoreV2_1}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
[[node save/RestoreV2_1 (defined at /usr/local/lib/python3.5/dist-packages/opennmt/runner.py:297) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
Caused by op 'save/RestoreV2_1', defined at:
File "/usr/local/bin/onmt-main", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/opennmt/bin/main.py", line 172, in main
runner.train_and_evaluate(checkpoint_path=args.checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/opennmt/runner.py", line 297, in train_and_evaluate
result = tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 215, in finalize
self._saver.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
name="restore_shard"))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
[[node save/RestoreV2_1 (defined at /usr/local/lib/python3.5/dist-packages/opennmt/runner.py:297) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/onmt-main", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/opennmt/bin/main.py", line 172, in main
runner.train_and_evaluate(checkpoint_path=args.checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/opennmt/runner.py", line 297, in train_and_evaluate
result = tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 566, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 288, in prepare_session
config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 218, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1582, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
[[node save/RestoreV2_1 (defined at /usr/local/lib/python3.5/dist-packages/opennmt/runner.py:297) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
Caused by op 'save/RestoreV2_1', defined at:
File "/usr/local/bin/onmt-main", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/opennmt/bin/main.py", line 172, in main
runner.train_and_evaluate(checkpoint_path=args.checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/opennmt/runner.py", line 297, in train_and_evaluate
result = tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 215, in finalize
self._saver.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
name="restore_shard"))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
tensor_name = optim/beta1_power; expected dtype float does not equal original dtype double
tensor_name = optim/beta2_power; expected dtype float does not equal original dtype double
[[node save/RestoreV2_1 (defined at /usr/local/lib/python3.5/dist-packages/opennmt/runner.py:297) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
Thanks, any advice you might have would be greatly appreciated.
Cheers,
Natasha