Eos in source segments

Hello,

I’ve been trying to apply “</s>” for my source segments with this code. (right after sentencepiece tokenization:

with open(inputList[i], 'r') as rf, open(outputList[i], 'w') as wf:

        if ((('src' in inputList[i]) and not inverteLanguage) or ('tgt' in inputList[i]) and inverteLanguage):

            for line in rf:

                wf.write(' '.join(sourceSP.encode_as_pieces(line[:-1])) + ' </s>' + '\n')

        else:

            for line in rf:

                wf.write(' '.join(targetSP.encode_as_pieces(line[:-1])) + '\n')

but when I start the training I see this in the log:

2022-01-06 19:37:52.291000: I inputter.py:318]  - special tokens: BOS=no, EOS=no
2022-01-06 19:37:52.320000: I inputter.py:318] Initialized target input layer:
2022-01-06 19:37:52.320000: I inputter.py:318]  - vocabulary size: 8001
2022-01-06 19:37:52.320000: I inputter.py:318]  - special tokens: BOS=yes, EOS=yes

where EOS=no.

Those it mean i applied the “</s>” thr wrong way?

If not, how to make sure the MT considered the tag.

Best regards,
Samuel

Hi,

In OpenNMT-tf there is an option to add </s> automatically:

data:
  source_sequence_controls:
    end: true

So you don’t need to add it manually when using this option.

2 Likes

Hi Guillaume!

I have this under the data section:

sequence_controls:
  start: true
  end: true

When I remove it, the training goes well. However, when I add them, at the evaluation step, I get the following error, and the training stops. I also tried updating TensorFlow. I checked the development files, they seem good and no empty lines.

2022-01-28 06:46:53.501588: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.505463: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.506445: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.510348: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.511383: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.515215: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.516219: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.520053: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.521083: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.524347: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
Traceback (most recent call last):
  File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
    sys.exit(main())
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
    hvd=hvd,
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
    moving_average_decay=train_config.get("moving_average_decay"),
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
    evaluator, step, moving_average=moving_average
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
    evaluator(step)
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
    loss, predictions = self._eval_fn(source, target)
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) INVALID_ARGUMENT:  indices[16,0] = 32 is not in [0, 32)
         [[node transformer_big_relative_1/GatherV2_1
 (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:586)
]]
         [[transformer_big_relative_1/strided_slice_24/_254]]
  (1) INVALID_ARGUMENT:  indices[16,0] = 32 is not in [0, 32)
         [[node transformer_big_relative_1/GatherV2_1
 (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:586)
]]
0 successful operations.
0 derived errors ignored. [Op:__inference_evaluate_111405]

Errors may have originated from an input operation.
Input Source operations connected to node transformer_big_relative_1/GatherV2_1:
In[0] transformer_big_relative_1/tile_batch_3/Reshape (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:119)
In[1] transformer_big_relative_1/ArgMax (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:585)
In[2] transformer_big_relative_1/GatherV2_1/axis:

Operation defined at: (most recent call last)
>>>   File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
>>>     sys.exit(main())
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
>>>     hvd=hvd,
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
>>>     moving_average_decay=train_config.get("moving_average_decay"),
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
>>>     evaluator, step, moving_average=moving_average
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
>>>     evaluator(step)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
>>>     loss, predictions = self._eval_fn(source, target)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 163, in evaluate
>>>     outputs, predictions = self(features, labels=labels)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 103, in __call__
>>>     outputs, predictions = super().__call__(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 180, in call
>>>     if not training:
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 181, in call
>>>     predictions = self._dynamic_decode(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 323, in _dynamic_decode
>>>     if params.get("replace_unknown_target", False):
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 354, in _dynamic_decode
>>>     replaced_target_tokens = replace_unknown_target(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 606, in replace_unknown_target
>>>     aligned_source_tokens = align_tokens_from_attention(source_tokens, attention)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 586, in align_tokens_from_attention
>>>     return tf.gather(tokens, alignment, axis=1, batch_dims=1)
>>>

Input Source operations connected to node transformer_big_relative_1/GatherV2_1:
In[0] transformer_big_relative_1/tile_batch_3/Reshape (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:119)
In[1] transformer_big_relative_1/ArgMax (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:585)
In[2] transformer_big_relative_1/GatherV2_1/axis:

Operation defined at: (most recent call last)
>>>   File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
>>>     sys.exit(main())
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
>>>     hvd=hvd,
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
>>>     moving_average_decay=train_config.get("moving_average_decay"),
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
>>>     evaluator, step, moving_average=moving_average
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
>>>     evaluator(step)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
>>>     loss, predictions = self._eval_fn(source, target)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 163, in evaluate
>>>     outputs, predictions = self(features, labels=labels)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 103, in __call__
>>>     outputs, predictions = super().__call__(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 180, in call
>>>     if not training:
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 181, in call
>>>     predictions = self._dynamic_decode(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 323, in _dynamic_decode
>>>     if params.get("replace_unknown_target", False):
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 354, in _dynamic_decode
>>>     replaced_target_tokens = replace_unknown_target(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 606, in replace_unknown_target
>>>     aligned_source_tokens = align_tokens_from_attention(source_tokens, attention)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 586, in align_tokens_from_attention
>>>     return tf.gather(tokens, alignment, axis=1, batch_dims=1)
>>>

Function call stack:
evaluate -> evaluate

Thanks!
Yasmin

Hi,

I’m not sure the error is related since the configuration looks incorrect and thus not applied. For sequence to sequence models with a single source, it should be:

source_sequence_controls:
  start: true
  end: true

(Note the source_ prefix.)

The field sequence_controls follows the same logic as vocabulary in order to support language models and multiple sources. See the documentation.


Regarding the error itself, it seems a relative position is going out of range? I don’t remember exactly the implementation details of relative Transformers, so I would need to check code to see how this can happen.

1 Like

I misread the error message. It fails when replacing the unknown tokens in the target. So the out-of-range error can indeed come from the added BOS and EOS in the source. I don’t think they are correctly handled in this replacement logic. I’ll check.

EDIT: I have a fix here:

1 Like

Thanks, Guillaume!

I have replaced the sequence_to_sequence.py file with the new one (or should I install the whole branch?)

Now, I get this:

2022-01-28 18:58:59.337809: W tensorflow/core/framework/op_kernel.cc:1733] INVALID_ARGUMENT: required broadcastable shapes

For me, it is not a big deal, I will just add the tokens manually. I have reported it as I thought you would be interested in fixing it.

Thanks again!
Yasmin

Thanks. This error should also be fixed in the branch (it did not properly cover the beam search case).

1 Like

Dear Guillaume,

I confirm source_sequence_controls works fine now in the latest version. Thanks a lot!

During the inference/translation time with the same model using OpenNMT-tf, should I add the <s> and </s> tokens manually to the source sentences? I know if I use CTranslate2, I will have to do this.

Kind regards,
Yasmin

Assuming you are running inference with the same configuration, there is no need to add these tokens manually.

Similarly, if you export the model to CTranslate2 with the OpenNMT-tf export command, the configuration is correctly forwarded to the CTranslate2 model and these tokens are added automatically during inference.

2 Likes

Thanks a lot, Guillaume. I tried with TransformerBigRelative but it is clear the translation with CTranslate2 gets too long and random without adding the start and end tokens to the source. If I add the start and end tokens manually to the list of tokens, the translation is reasonable.

Convention Command:

ct2-opennmt-tf-converter --model_path model/ --output_dir ct2_model --src_vocab vocab.tf.src --tgt_vocab vocab.tf.tgt --model_type TransformerBigRelative --quantization int8

Sorry, I should be more specific. I was referring to the export command documented in the OpenNMT-tf documentation. When using this command there is no need to manually add the EOS token.

On the other hand, there are currently some limitations with the script ct2-opennmt-tf-converter. In particular, there is no way to enable the source BOS or EOS tokens. We should update this conversion script to accept the training configuration.

EDIT: this PR is addressing the converter limitations:

1 Like