Preprocess argument "-filter_valid" - what does it mean?
Here we can see that “-filter_valid” argument does “Filter validation data by src and/or tgt length”.
But what does it mean? Unfortunately, I didn’t find answer or more detailed description.
So, if I add argument “-filter_valid 100” - any lines from validation set with length <100 will be ignored or what?

Hello - this option implemented here: - is applying the same filters defined for train data with --src_seq_length and --tgt_seq_length to validation data.
It is a boolean option - so just do --filter_valid. Also the filtering in this case is for sentences longer (and not shorter) than the defined value.

1 Like