KeyError when converting old SavedModel to ctranslate2 1.x

AlexMisiulia · September 20, 2021, 2:03pm

I try to convert old SavedModel to ctranslate2 1.*. I don’t know the exact version of opennmt-tf because other developer trained it (probably it was 1.14.0 because it was at the end of 2018).

Command:

pipenv run ct2-opennmt-tf-converter \
--model_path en-zh-Hant/ \
--output_dir en-zh-Hant_output \
--model_spec TransformerBase \
--src_vocab en-zh-Hant/assets/1.en.spm.vocab.clear \
--tgt_vocab en-zh-Hant/assets/1.zh-Hant.spm.vocab.clear

Env (Pipfile):
[[source]]
url = “Simple index”
verify_ssl = true
name = “pypi”

[packages]
ctranslate2 = “==1.*”

[dev-packages]

[requires]
python_version = “3.9”

Error:
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable ‘transformer/encoder/layer_0/multi_head/conv1d/kernel:0’ shape=(1, 512, 1536) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
Traceback (most recent call last):
File “/root/lm_projects/opennmt-tf-converter/.venv/bin/ct2-opennmt-tf-converter”, line 8, in
sys.exit(main())
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/bin/opennmt_tf_converter.py”, line 27, in main
converters.OpenNMTTFConverter(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/converter.py”, line 47, in convert_from_args
return self.convert(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/converter.py”, line 69, in convert
self._load(model_spec)
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 164, in _load
set_transformer_spec_v2(model_spec, variables)
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 174, in set_transformer_spec_v2
set_embeddings(
File “/root/lm_projects/opennmt-tf-converter/.venv/lib/python3.9/site-packages/ctranslate2/converters/opennmt_tf.py”, line 400, in set_embeddings
spec.weight = variables[variable_name]
KeyError: ‘model/examples_inputter/features_inputter/embedding’

guillaumekln · September 20, 2021, 2:09pm

Since the SavedModel was exported with TensorFlow 1.x, you should run the CTranslate2 converter with TensorFlow 1.x installed.

AlexMisiulia · September 20, 2021, 3:38pm

Thank you for the fast response! Sorry for beginners questions I just need to understand can I do something with existing models or do I need to train all of them from scratch. After installing tf 1.x I have this error. Maybe you know some quick fixes:

2021-09-20 15:37:44.043550: W tensorflow/core/kernels/lookup_util.cc:114] Truncated ru-en/assets/1.en.spm.vocab.clear before its end at 30000 records.
2021-09-20 15:37:44.043609: W tensorflow/core/kernels/lookup_util.cc:116] next_id_ : 30000
2021-09-20 15:37:44.046593: W tensorflow/core/kernels/lookup_util.cc:114] Truncated ru-en/assets/1.ru.spm.vocab.clear before its end at 30000 records.
2021-09-20 15:37:44.046636: W tensorflow/core/kernels/lookup_util.cc:116] next_id_ : 30000
WARNING:tensorflow:From /root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/opennmt_tf.py:92: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

Traceback (most recent call last):
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/bin/ct2-opennmt-tf-converter”, line 8, in
sys.exit(main())
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/bin/opennmt_tf_converter.py”, line 29, in main
).convert_from_args(args)
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/converter.py”, line 52, in convert_from_args
force=args.force,
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/converters/converter.py”, line 74, in convert
model_spec.validate()
File “/root/.local/share/virtualenvs/opennmt-tf-converter-SDP2EdzO/lib/python3.7/site-packages/ctranslate2/specs/model_spec.py”, line 255, in validate
“of size %d” % (name.capitalize(), len(vocabulary), expected_size)
ValueError: Source vocabulary has size 30002 but the model expected a vocabulary of size 30001

guillaumekln · September 20, 2021, 3:57pm

Can you post the first 5 lines of the vocabulary ru-en/assets/1.en.spm.vocab.clear?

AlexMisiulia · September 21, 2021, 7:01am

<blank>
<s>
</s>
widx3
widx4

guillaumekln · September 21, 2021, 1:55pm

This looks OK. I thought the vocabulary might be invalid. Just to make sure, does the other vocabulary start with the same special tokens?

AlexMisiulia · September 21, 2021, 2:45pm

Yes, the same:

<blank>
<s>
</s>
widx3
widx4

guillaumekln · September 22, 2021, 7:56am

The conversion should work, but there may be an issue with the vocabulary. Can you check for empty lines or duplicated tokens?

AlexMisiulia · September 24, 2021, 8:39am

Just checked and didn’t found empty lines or duplicated tokens

guillaumekln · February 11, 2022, 10:44am

For reference, someone else had the same warning message and the issue was a duplicated token in the vocabulary. We added a more explicit warning in OpenNMT-tf which includes the token and its position in the vocabulary file: