So I’ve built my first En->El production system and the results are fantastic. There’s a problem though: when the system deals with joined sentences due to bad segmentation, i.e two sentences combined with a semicolon ;, most of the times the second sentence is not translated at all. This doesn’t happen consistently, rather at a high rate ~70-80%. Is that expected?
If you don’t have such sequence in your training data then yes, it is expected. It learned to stop the generation when encountering the end of a sentence.
I’ve just typed in a few sentences joined with semi-colons and they get translated fine BUT I know there are many such sentences in my training data. I also sometimes get a prediction with two shorter sentences separated by a period in the input being joined by “, and” in the prediction. It all comes down to what is seen in the training data.