I have trained a transformer model and used ctranslate2 to convert the checkpoints in suitable format, but when i try to do the translation i am not getting any translation
Command line: ct2-opennmt-tf-converter --model_path /content/drive/MyDrive/Data/English-Tamil/en-kn/run/avg --output_dir ende_ctranslate2
I dont understand which is creating the error
Please provide more information:
- the training command line and configuration
- the tokenization method
- how you are running the translation behind the web interface.
Also, did you try other inputs?
Try running the inference with OpenNMT-tf and see if it is different.
no i am not running into any problems with opennmt-tf
These kind of problems usually arise from mismatches in tokenization, vocabularies and/or input format between training and inference.
Could you please try to test CTranslate2 independently. Please make sure you update your version to the latest before converting the model.
Here is a sample code you can use to test your model. Please change the
detokenize functions as well as your CTranslate2 model path. Test the model with complete sentences rather than single words.
Kindly revise your whole paths; I agree with Panos there might be something to correct.
# Replace with your tokenize function and source tokenization model
tokens = [input_sentence.split(" ") for input_sentence in input_sentences]
# Replace with your detokenize function and target tokenization model
translation = [" ".join([t for t in output]) for output in outputs]
# Modify the path to the CTranslate2 model directory
model_path = "ctranslate2_model"
source_sentences = ["how are you?", "fine, thanks!", "everything is great.", "I am happy to know that."]
translator = ctranslate2.Translator(model_path, "cpu") # "cpu" or "cuda"
outputs = translator.translate_batch(tokenize(source_sentences), beam_size=5)
translation = detokenize(output["tokens"] for output in outputs)