Option 'features_vocabs_prefix' value is not valid

Hi!

No matter what values I try, calling preprocess.lua with the -features_vocabs_prefix always returns the same error:

option 'features_vocabs_prefix' value is not valid

This seems to be caused by this code in onmt/data/Preprocessor.lua:

local commonOptions = {
  {'-features_vocabs_prefix', '',      [[Path prefix to existing features vocabularies]],
                                       {valid=onmt.utils.ExtendedCmdLine.fileNullOrExists}},

It appears to be checking for the existence of a file, not a path prefix. If I create an empty file (e.g. exp/en-fr), and try again, then I get this error:

./onmt/data/Vocabulary.lua:97: dictionary 'exp/en-fr.train_feature_1.dict' not found

This is normal, since there are two feature dictionaries files: one for the source (en-fr.source_feature_1.dict) and one for the target (en-fr.target_feature_1.dict). Adding the -data_type bitext parameter doesn’t help either.

Has anybody else experienced this same issue? Am I doing something wrong?

Thanks!

EDIT: FYI, I’m trying to use BPE and source and target dictionaries generated for a massive data set, and then train a network with smaller parts of the data set in turn (mostly because I can’t even fit the entire data set in RAM).

Never mind, I figured it out - I needed to rename the source file to en-fr.train_feature_1.dict, and then it all works!

thanks Daniel - I fixed the invalid check on the file existence.

1 Like