In parallel corpus what should I keep, single sentence in each line or paragraph for MT?

(karimkhan) #1

I am trying to create my own corpus from Hindi business news .


(Guillaume Klein) #2


For machine translation, corpus are usually single sentences.

(Karimkhan) #3

Thanks @guillaumekln,
Does taking sentences or paragraph makes any different in terms of accuracy? Just for my curiosity.

I prefer sentence by sentences, but in that way I need long preprocessing of converting each sentences to target language for building the corpus.