Hi, I’ve been using m2m100 for some time and I’ve noticed that some sentences are not fully translated. Indeed it seems that some parts of sentences are cut. For example if I take this sentence:
So I used to use RITA because it allowed me to translate one language from say an English channel to a Spanish channel and vice versa. Would that be possible with this bot? I am trying to find an alternative now that RITA went kaput.
send back this translation:
Donc j'ai utilisé RITA parce que cela m'a permis de traduire une langue de dire un canal anglais à un canal espagnol et vice versa.
However, if we replace the . with a comma, it translates the whole sentence up to the question, but again, it does not completely translate the sentence.
So I used to use RITA because it allowed me to translate one language from say an English channel to a Spanish channel and vice versa, Would that be possible with this bot? I am trying to find an alternative now that RITA went kaput.
send back this translation:
Donc j'ai utilisé RITA parce que cela m'a permis de traduire une langue de dire un canal anglais à un canal espagnol et vice versa, Est-ce possible avec ce bot?
But be careful this bug happens only for some sentences, most of them are correctly translated, I looked at all the settings and I didn’t find any that fixes the problem, here is the code to use:
import os
os.environ["OMP_NUM_THREADS"] = "2"
os.environ["CT2_USE_EXPERIMENTAL_PACKED_GEMM"] = "1"
import ctranslate2
import sentencepiece as spm
import time
translator = ctranslate2.Translator("end_output", device="cpu", inter_threads=1, intra_threads=1, compute_type="auto")
s = spm.SentencePieceProcessor(model_file='spm.128k.model')
string = input("Entrez un texte: ")
a = string
string = ["__en__"] + s.encode(string, out_type=str)
value = translator.translate_batch(
[string],
target_prefix=[["__fr__"]],
)
print("start")
for i in range(5):
time1 = time.time()
value = translator.translate_batch(
[string],
target_prefix=[["__fr__"]],
return_scores=False,
max_decoding_length=2000,
max_input_length=0
)
print(s.decode(value[0].hypotheses[0][1:]))
print(time.time()-time1)
I specify that this problem is common to hugging face, so it is not related to ctranslate2.
Thanks for your future answer