Error in preprocess stage


(Andrew) #1

Hello.
Could you tell, is this a error related with tokenization?


(miguel canals) #2

Hi Andrew,

Not exaclty related to tokenization.

The files you provide should be well aligned, and looks like the eng and russian files do not have the same number of lines, is a basic verification for a 1 to 1 relation. Is this the problem?

Have a nice day!
miguel canals


(Terence Lewis) #3

Yes, always a good idea to run wc -l on source & target before starting any processing. Even an apparently empty line with a hard return can throw everything out of kilter :slight_smile:


(Andrew) #4

Thank you, you correctly identified the problem. :slight_smile: