In parallel corpus what should I keep, single sentence in each line or paragraph for MT?


(karimkhan) #1

I am trying to create my own corpus from Hindi business news .

Thanks
Karim


(Guillaume Klein) #2

Hello,

For machine translation, corpus are usually single sentences.


(Karimkhan) #3

Thanks @guillaumekln,
Does taking sentences or paragraph makes any different in terms of accuracy? Just for my curiosity.

I prefer sentence by sentences, but in that way I need long preprocessing of converting each sentences to target language for building the corpus.