Source/target not aligned

I’ve built quite a few models now, including two in production use, and for the first time I am getting the warning “source/target not aligned” during preprocessing after adding some new data to previously used training data. I have double-checked the tokenized source & target files and everything seems to be perfectly aligned. What is causing this warning, I wonder.

Do the files have the exact same number of lines? You could use wc -l to quickly check.

Indeed - that’s on my basic checklist before lift-off :slight_smile:

What is the exact warning message?

The message in the code reads:
_G.logger:warning (SENT %s : source/target not aligned (%d%d)’, tostring(idx), length1, length2)
and I can see that this refers to a length disparity.
Most of the data is “old data” (preprocessed in already built models) and I’m effectively doing some incremental training. As far as I can see there are no length disparities in this new data.

What version are you using? Also are you using the -check_plength option?

Using v7 and in fact have only encountered this problem since installing v7. Not using -check_length option. Will try that.

This warning is not in v0.7 so you should re-check which version you are using. Can you do:

git checkout v0.7.1

and retry?

Ah, the error was in my own training script where I had not put an absolute path to my latest version but was pointing to an older version. Preprocessing has now completed correctly. Sorry for wasting your time :frowning:

1 Like