Could you tell, is this a error related with tokenization?
Not exaclty related to tokenization.
The files you provide should be well aligned, and looks like the eng and russian files do not have the same number of lines, is a basic verification for a 1 to 1 relation. Is this the problem?
Have a nice day!
Yes, always a good idea to run wc -l on source & target before starting any processing. Even an apparently empty line with a hard return can throw everything out of kilter
Thank you, you correctly identified the problem.