I have an en-es model trained on single sentences and would like to be able to run inference for larger blocks of text. Any recommendations for how to do this? For this model and other European languages it seems reasonable to split on periods and then translate each sentence independently. However, in the future I’d like to do other languages (Chinese and Arabic) which don’t necessarily use periods for punctuation. Additionally, I’d like to reasonably handle European languages with non standard punctuation in poetry, lyrics, etc…
I’m wondering if there is a standard approach for this?