KeyError when converting old SavedModel to ctranslate2 1.x

I try to convert old SavedModel to ctranslate2 1.*. I don’t know the exact version of opennmt-tf because other developer trained it (probably it was 1.14.0 because it was at the end of 2018).

Command:

pipenv run ct2-opennmt-tf-converter \
--model_path en-zh-Hant/ \
--output_dir en-zh-Hant_output \
--model_spec TransformerBase \
--src_vocab en-zh-Hant/assets/1.en.spm.vocab.clear \
--tgt_vocab en-zh-Hant/assets/1.zh-Hant.spm.vocab.clear

Env (Pipfile):
[[source]]
url = “Simple index
verify_ssl = true
name = “pypi”

[packages]
ctranslate2 = “==1.*”

[dev-packages]

[requires]
python_version = “3.9”

Error:
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable ‘transformer/encoder/layer_0/multi_head/conv1d/kernel:0’ shape=(1, 512, 1536) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
Traceback (most recent call last):
File “/root/lm_projects/opennmt-tf-converter/.venv/bin/ct2-opennmt-tf-converter”, line 8, in
sys.exit(main())
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/bin/opennmt_tf_converter.py”, line 27, in main
converters.OpenNMTTFConverter(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/converter.py”, line 47, in convert_from_args
return self.convert(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/converter.py”, line 69, in convert
self._load(model_spec)
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 164, in _load
set_transformer_spec_v2(model_spec, variables)
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 174, in set_transformer_spec_v2
set_embeddings(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 400, in set_embeddings
spec.weight = variables[variable_name]
KeyError: ‘model/examples_inputter/features_inputter/embedding’

Since the SavedModel was exported with TensorFlow 1.x, you should run the CTranslate2 converter with TensorFlow 1.x installed.

Thank you for the fast response! Sorry for beginners questions I just need to understand can I do something with existing models or do I need to train all of them from scratch. After installing tf 1.x I have this error. Maybe you know some quick fixes:

2021-09-20 15:37:44.043550: W tensorflow/core/kernels/lookup_util.cc:114] Truncated ru-en/assets/1.en.spm.vocab.clear before its end at 30000 records.
2021-09-20 15:37:44.043609: W tensorflow/core/kernels/lookup_util.cc:116] next_id_ : 30000
2021-09-20 15:37:44.046593: W tensorflow/core/kernels/lookup_util.cc:114] Truncated ru-en/assets/1.ru.spm.vocab.clear before its end at 30000 records.
2021-09-20 15:37:44.046636: W tensorflow/core/kernels/lookup_util.cc:116] next_id_ : 30000
WARNING:tensorflow:From /root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/opennmt_tf.py:92: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

Traceback (most recent call last):
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/bin/ct2-opennmt-tf-converter”, line 8, in
sys.exit(main())
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/bin/opennmt_tf_converter.py”, line 29, in main
).convert_from_args(args)
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/converter.py”, line 52, in convert_from_args
force=args.force,
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/converter.py”, line 74, in convert
model_spec.validate()
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/specs/model_spec.py”, line 255, in validate
“of size %d” % (name.capitalize(), len(vocabulary), expected_size)
ValueError: Source vocabulary has size 30002 but the model expected a vocabulary of size 30001

Can you post the first 5 lines of the vocabulary ru-en/assets/1.en.spm.vocab.clear?

<blank>
<s>
</s>
widx3
widx4

This looks OK. I thought the vocabulary might be invalid. Just to make sure, does the other vocabulary start with the same special tokens?

Yes, the same:

<blank>
<s>
</s>
widx3
widx4

The conversion should work, but there may be an issue with the vocabulary. Can you check for empty lines or duplicated tokens?

Just checked and didn’t found empty lines or duplicated tokens :frowning:

For reference, someone else had the same warning message and the issue was a duplicated token in the vocabulary. We added a more explicit warning in OpenNMT-tf which includes the token and its position in the vocabulary file:

1 Like