Greetings fellow researchers,
Recently I was working on building a model to perform some translation task. But Some how after performing the training and during translation from the source string , I am getting ’ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ’ as output
Can anyone help me with this , point me to the right direction if possible so that I can get the right output.
Below is my config File setup:
model_dir: /media/secondary-disk/data/models/TransformerTiny
data:
source_tokenization:
type: SentencePieceTokenizer
params:
model: /media/secondary-disk/data/ml_data/trg_data/spm_small.model
target_tokenization:
type: SentencePieceTokenizer
params:
model: /media/secondary-disk/data/ml_data/trg_data/spm_small.model
train_features_file: /media/secondary-disk/data/ml_data/trg_data/train_src.txt
train_labels_file: /media/secondary-disk/data/ml_data/trg_data/train_tgt.txt
eval_features_file: /media/secondary-disk/data/ml_data/trg_data/val_src.txt
eval_labels_file: /media/secondary-disk/data/ml_data/trg_data/val_tgt.txt
source_vocabulary: /media/secondary-disk/data/ml_data/trg_data/spm_small.vocab
target_vocabulary: /media/secondary-disk/data/ml_data/trg_data/spm_small.vocab
train:
batch_size: 0
batch_type: tokens
save_checkpoints_steps: 5000
keep_checkpoint_max: 3
max_step: 1000000
params:
optimizer: Adam
optimizer_params:
beta_1: 0.8
beta_2: 0.998
learning_rate: 1.0
dropout: 0.3
regularization:
type: l2 # can be "l1", "l2", "l1_l2" (case-insensitive).
scale: 1e-4 # if using "l1_l2" regularization, this should be a YAML list.
decay_type: NoamDecay
decay_params:
model_dim: 512
warmup_steps: 5000
decay_step_duration: 1
start_decay_steps: 50000
minimum_learning_rate: 0.0001
beam_width: 5
minimum_decoding_length: 6
maximum_decoding_length: 6
share_embeddings: 3
eval:
scorers: bleu
steps: 5000
early_stopping:
metric: loss
min_improvement: 0.001
steps: 10
export_on_best: bleu
infer:
batch_size: 256
batch_type: tokens
n_best: 1
with_scores: true
Below is the Signature of the model that was trained .
2022-12-06 11:43:28.928112: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
The given SavedModel SignatureDef contains the following input(s):
inputs['text'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: serving_default_text:0
The given SavedModel SignatureDef contains the following output(s):
outputs['log_probs'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall_4:0
outputs['text'] tensor_info:
dtype: DT_STRING
shape: (-1, 1)
name: StatefulPartitionedCall_4:1
Method name is: tensorflow/serving/predict
And I am using the following code to do the serving :
import argparse
import os
import tensorflow as tf
import tensorflow_addons as tfa # Register TensorFlow Addons kernels.
import pyonmttok
class Translator(object):
def __init__(self, export_dir):
imported = tf.saved_model.load(export_dir)
self._translate_fn = imported.signatures["serving_default"]
sp_model_path="/media/secondary-disk/data/ml_data/trg_data/spm_small.model"
self._tokenizer = pyonmttok.Tokenizer("none", sp_model_path=sp_model_path)
def translate(self, src):
"""Translates a batch of texts."""
inputs = self._preprocess(src)
outputs = self._translate_fn(**inputs)
return self._postprocess(outputs)
def _preprocess(self, src):
all_tokens_src = []
for text_src in src:
tokens_src, _ = self._tokenizer.tokenize(text_src)
all_tokens_src.append(tokens_src)
inputs = {
"text": tf.constant(all_tokens_src, dtype=tf.string)}
return inputs
def _postprocess(self, outputs):
texts = []
for tokens in zip(outputs["text"].numpy()):
tokens = list(tokens[0])
texts.append(self._tokenizer.detokenize(tokens))
return texts
And after using this serving code , I am getting Question Marks as my output.
import tensorflow as tf
import tensorflow_text
translator = Translator("model_folder_path")
data = ["Source String entered here for translation "]
output = translator.translate(data)
target = output[0]
target
' ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ '
Your input will be highly appreciated .
Best Regards,