Eos in source segments

SamuelLacombe · January 6, 2022, 9:57pm

Hello,

I’ve been trying to apply “</s>” for my source segments with this code. (right after sentencepiece tokenization:

with open(inputList[i], 'r') as rf, open(outputList[i], 'w') as wf:

        if ((('src' in inputList[i]) and not inverteLanguage) or ('tgt' in inputList[i]) and inverteLanguage):

            for line in rf:

                wf.write(' '.join(sourceSP.encode_as_pieces(line[:-1])) + ' </s>' + '\n')

        else:

            for line in rf:

                wf.write(' '.join(targetSP.encode_as_pieces(line[:-1])) + '\n')

but when I start the training I see this in the log:

2022-01-06 19:37:52.291000: I inputter.py:318]  - special tokens: BOS=no, EOS=no
2022-01-06 19:37:52.320000: I inputter.py:318] Initialized target input layer:
2022-01-06 19:37:52.320000: I inputter.py:318]  - vocabulary size: 8001
2022-01-06 19:37:52.320000: I inputter.py:318]  - special tokens: BOS=yes, EOS=yes

where EOS=no.

Those it mean i applied the “</s>” thr wrong way?

If not, how to make sure the MT considered the tag.

Best regards,
Samuel

guillaumekln · January 7, 2022, 8:39am

Hi,

In OpenNMT-tf there is an option to add </s> automatically:

data:
  source_sequence_controls:
    end: true

So you don’t need to add it manually when using this option.

ymoslem · January 28, 2022, 11:02am

Hi Guillaume!

I have this under the data section:

sequence_controls:
  start: true
  end: true

When I remove it, the training goes well. However, when I add them, at the evaluation step, I get the following error, and the training stops. I also tried updating TensorFlow. I checked the development files, they seem good and no empty lines.

2022-01-28 06:46:53.501588: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.505463: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.506445: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.510348: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.511383: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.515215: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.516219: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.520053: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.521083: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
2022-01-28 06:46:53.524347: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA RTX A4000" frequency: 1560 num_cores: 48 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 14905966592 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
Traceback (most recent call last):
  File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
    sys.exit(main())
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
    hvd=hvd,
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
    moving_average_decay=train_config.get("moving_average_decay"),
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
    evaluator, step, moving_average=moving_average
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
    evaluator(step)
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
    loss, predictions = self._eval_fn(source, target)
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) INVALID_ARGUMENT:  indices[16,0] = 32 is not in [0, 32)
         [[node transformer_big_relative_1/GatherV2_1
 (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:586)
]]
         [[transformer_big_relative_1/strided_slice_24/_254]]
  (1) INVALID_ARGUMENT:  indices[16,0] = 32 is not in [0, 32)
         [[node transformer_big_relative_1/GatherV2_1
 (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:586)
]]
0 successful operations.
0 derived errors ignored. [Op:__inference_evaluate_111405]

Errors may have originated from an input operation.
Input Source operations connected to node transformer_big_relative_1/GatherV2_1:
In[0] transformer_big_relative_1/tile_batch_3/Reshape (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:119)
In[1] transformer_big_relative_1/ArgMax (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:585)
In[2] transformer_big_relative_1/GatherV2_1/axis:

Operation defined at: (most recent call last)
>>>   File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
>>>     sys.exit(main())
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
>>>     hvd=hvd,
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
>>>     moving_average_decay=train_config.get("moving_average_decay"),
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
>>>     evaluator, step, moving_average=moving_average
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
>>>     evaluator(step)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
>>>     loss, predictions = self._eval_fn(source, target)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 163, in evaluate
>>>     outputs, predictions = self(features, labels=labels)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 103, in __call__
>>>     outputs, predictions = super().__call__(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 180, in call
>>>     if not training:
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 181, in call
>>>     predictions = self._dynamic_decode(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 323, in _dynamic_decode
>>>     if params.get("replace_unknown_target", False):
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 354, in _dynamic_decode
>>>     replaced_target_tokens = replace_unknown_target(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 606, in replace_unknown_target
>>>     aligned_source_tokens = align_tokens_from_attention(source_tokens, attention)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 586, in align_tokens_from_attention
>>>     return tf.gather(tokens, alignment, axis=1, batch_dims=1)
>>>

Input Source operations connected to node transformer_big_relative_1/GatherV2_1:
In[0] transformer_big_relative_1/tile_batch_3/Reshape (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/tensorflow_addons/seq2seq/beam_search_decoder.py:119)
In[1] transformer_big_relative_1/ArgMax (defined at /home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py:585)
In[2] transformer_big_relative_1/GatherV2_1/axis:

Operation defined at: (most recent call last)
>>>   File "/home/machine/.venvs/onmttf/bin/onmt-main", line 8, in <module>
>>>     sys.exit(main())
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/bin/main.py", line 312, in main
>>>     hvd=hvd,
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/runner.py", line 284, in train
>>>     moving_average_decay=train_config.get("moving_average_decay"),
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 135, in __call__
>>>     evaluator, step, moving_average=moving_average
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/training.py", line 192, in _evaluate
>>>     evaluator(step)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/evaluation.py", line 319, in __call__
>>>     loss, predictions = self._eval_fn(source, target)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 163, in evaluate
>>>     outputs, predictions = self(features, labels=labels)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/model.py", line 103, in __call__
>>>     outputs, predictions = super().__call__(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 180, in call
>>>     if not training:
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 181, in call
>>>     predictions = self._dynamic_decode(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 323, in _dynamic_decode
>>>     if params.get("replace_unknown_target", False):
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 354, in _dynamic_decode
>>>     replaced_target_tokens = replace_unknown_target(
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 606, in replace_unknown_target
>>>     aligned_source_tokens = align_tokens_from_attention(source_tokens, attention)
>>>
>>>   File "/home/machine/.venvs/onmttf/lib/python3.7/site-packages/opennmt/models/sequence_to_sequence.py", line 586, in align_tokens_from_attention
>>>     return tf.gather(tokens, alignment, axis=1, batch_dims=1)
>>>

Function call stack:
evaluate -> evaluate

Thanks!
Yasmin

guillaumekln · January 28, 2022, 11:36am

Hi,

I’m not sure the error is related since the configuration looks incorrect and thus not applied. For sequence to sequence models with a single source, it should be:

source_sequence_controls:
  start: true
  end: true

(Note the source_ prefix.)

The field sequence_controls follows the same logic as vocabulary in order to support language models and multiple sources. See the documentation.

Regarding the error itself, it seems a relative position is going out of range? I don’t remember exactly the implementation details of relative Transformers, so I would need to check code to see how this can happen.

guillaumekln · January 28, 2022, 1:53pm

I misread the error message. It fails when replacing the unknown tokens in the target. So the out-of-range error can indeed come from the added BOS and EOS in the source. I don’t think they are correctly handled in this replacement logic. I’ll check.

EDIT: I have a fix here:

ymoslem · January 28, 2022, 4:23pm

Thanks, Guillaume!

I have replaced the sequence_to_sequence.py file with the new one (or should I install the whole branch?)

Now, I get this:

2022-01-28 18:58:59.337809: W tensorflow/core/framework/op_kernel.cc:1733] INVALID_ARGUMENT: required broadcastable shapes

For me, it is not a big deal, I will just add the tokens manually. I have reported it as I thought you would be interested in fixing it.

Thanks again!
Yasmin

guillaumekln · January 28, 2022, 4:38pm

Thanks. This error should also be fixed in the branch (it did not properly cover the beam search case).

ymoslem · April 19, 2022, 1:44am

Dear Guillaume,

I confirm source_sequence_controls works fine now in the latest version. Thanks a lot!

During the inference/translation time with the same model using OpenNMT-tf, should I add the <s> and </s> tokens manually to the source sentences? I know if I use CTranslate2, I will have to do this.

Kind regards,
Yasmin

guillaumekln · April 19, 2022, 7:57am

Assuming you are running inference with the same configuration, there is no need to add these tokens manually.

Similarly, if you export the model to CTranslate2 with the OpenNMT-tf export command, the configuration is correctly forwarded to the CTranslate2 model and these tokens are added automatically during inference.

ymoslem · April 24, 2022, 1:53pm

Thanks a lot, Guillaume. I tried with TransformerBigRelative but it is clear the translation with CTranslate2 gets too long and random without adding the start and end tokens to the source. If I add the start and end tokens manually to the list of tokens, the translation is reasonable.

Convention Command:

ct2-opennmt-tf-converter --model_path model/ --output_dir ct2_model --src_vocab vocab.tf.src --tgt_vocab vocab.tf.tgt --model_type TransformerBigRelative --quantization int8

guillaumekln · April 25, 2022, 8:11am

Sorry, I should be more specific. I was referring to the export command documented in the OpenNMT-tf documentation. When using this command there is no need to manually add the EOS token.

On the other hand, there are currently some limitations with the script ct2-opennmt-tf-converter. In particular, there is no way to enable the source BOS or EOS tokens. We should update this conversion script to accept the training configuration.

EDIT: this PR is addressing the converter limitations:

github.com/OpenNMT/CTranslate2

Add new OpenNMT-tf converter for model instances

OpenNMT:master ← guillaumekln:opennmt-tf-converter-from-model

opened 11:08AM - 03 May 22 UTC

guillaumekln

+403 -27

This PR adds a new converter class `ctranslate2.converters.OpenNMTTFConverterV2`… that can convert OpenNMT-tf model instances, similar to the Fairseq converter. It also extends the command line options to accept the training configuration, which usually contains all necessary information to complete the conversion: ```bash ct2-opennmt-tf-converter --config config.yml --output_dir ct2_model ``` Closes #789.