I am developoing an English - Spanish translator but I have found some strange behaviours while testing it. Setting a baseline, I got a BLEU score of 0.37 with the test dataset and in general translations are decent despite of the lack of more vocabulary, and an accuracy of 71 in the validation dataset while training.
However, I have noticed that sometimes there are complete sentences which appear in the source language after trying to translate them.
I am gonna give an example.
Imagine I want to translate this sentence to spanish:
“Musicological influences and references can be found throughout his work; he has even included musical notation in the text to make a point.”
Even though previous and posterior phrases are translated OK (understanding there are some errors, but the result in general is decent), this sentence is translated as:
" and can be found throughout his he even even music in the text to make a "
However, I make a little change just adding “his” at the beggining of the sentence, like:
“His musicological influences and references can be found throughout his work; he has even included musical notation in the text to make a point.”
The translation is:
“Sus influencias y referencias pueden ser encontradas en su incluso ha incluido la notación musical en el texto para hacer un punto.”
This has sense in spanish, despite of the "unk"s. What’s more, the words “influences” and “references” which were translated as “unk” in the first example are well translated in this second example: “influencias” means “influences” and “referencias” means “references”.
This has happened to me in more examples, I am not sure why the texts are in general well translated but some sentences appears non-translated. Maybe my model is quite poor. So, please, if you can give me some advice I’d be grateful.