So I’ve built my first En->El production system and the results are fantastic. There’s a problem though: when the system deals with joined sentences due to bad segmentation, i.e two sentences combined with a semicolon ;, most of the times the second sentence is not translated at all. This doesn’t happen consistently, rather at a high rate ~70-80%. Is that expected?
If you don’t have such sequence in your training data then yes, it is expected. It learned to stop the generation when encountering the end of a sentence.
I’ve just typed in a few sentences joined with semi-colons and they get translated fine BUT I know there are many such sentences in my training data. I also sometimes get a prediction with two shorter sentences separated by a period in the input being joined by “, and” in the prediction. It all comes down to what is seen in the training data.
Thanks for the feedback, I thought so. Indeed, the vast majority of the sentences in the training data do not follow this pattern so I’d say this is the client application’s problem of not segmenting the text as it should. It’s not a major problem though as it occurs rarely.