Hello.
Could you tell, is this a error related with tokenization?
Hi Andrew,
Not exaclty related to tokenization.
The files you provide should be well aligned, and looks like the eng and russian files do not have the same number of lines, is a basic verification for a 1 to 1 relation. Is this the problem?
Have a nice day!
miguel canals
1 Like
Yes, always a good idea to run wc -l on source & target before starting any processing. Even an apparently empty line with a hard return can throw everything out of kilter
1 Like
Thank you, you correctly identified the problem.