Hi everyone,
I have trained my Transformer model with 50.000 vocab, let’s called it Model 1. I trained it using this command below. I trained it with 100.000 steps. Experiment2_Model2_2 is my Model 1’s directory.
onmt-main train_and_eval --model_type Transformer --auto_config --config ~/Experiment2_Model2_2/config.yml --num_gpus 3
What I want to do is to take the weight from Model 1 and use it as initialization for Model 2. So I created a directory for Model 2 and put new train data, eval data, and 50000 vocab. Then I trained my Model 2 with this command below. Experiment2_ModelTransfer is my directory for Model 2.
onmt-main train_and_eval --model_type Transformer --auto_config --config ~/Experiment2_ModelTransfer/config.yml --checkpoint_path ~/Experiment2_Model2_2/model.ckpt-102826 --num_gpus 3
But when I run that command, I got this error
INFO:tensorflow:Training on 298532 examples
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Number of trainable parameters: 120992081
Traceback (most recent call last):
File "/home/fhadli/anaconda3/envs/fhadli/bin/onmt-main", line 10, in <module>
sys.exit(main())
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/opennmt/bin/main.py", line 172, in main
runner.train_and_evaluate(checkpoint_path=args.checkpoint_path)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/opennmt/runner.py", line 297, in train_and_evaluate
result = tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 439, in train_and_evaluate
executor.run()
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 518, in run
self.run_local()
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 650, in run_local
hooks=train_hooks)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 363, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 843, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 859, in _train_model_default
saving_listeners)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1056, in _train_with_estimator_spec
log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 405, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 816, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 532, in __init__
h.begin()
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/opennmt/utils/hooks.py", line 295, in begin
tf_vars.append(tf.get_variable(name, shape=value.shape, dtype=tf.as_dtype(value.dtype)))
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
constraint=constraint)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
constraint=constraint)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
constraint=constraint)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/home/fhadli/anaconda3/envs/fhadli/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 738, in _get_single_variable
found_var.get_shape()))
ValueError: Trying to share variable transformer/decoder/dense/bias, but specified shape (48373,) and found shape (50001,).
The size of my w_emb is here:
- Model 1
transformer_decoder_w_embs.txt = 48393
transformer_encoder_w_embs.txt = 50021 - Model 2
transformer_decoder_w_embs.txt = 50002
transformer_encoder_w_embs.txt = 50001
Does anyone know how to get out from this error? I could not find similar issue so I will provide more information if needed. Thank you