Hi!
No matter what values I try, calling preprocess.lua with the -features_vocabs_prefix always returns the same error:
option 'features_vocabs_prefix' value is not valid
This seems to be caused by this code in onmt/data/Preprocessor.lua:
local commonOptions = {
{'-features_vocabs_prefix', '', [[Path prefix to existing features vocabularies]],
{valid=onmt.utils.ExtendedCmdLine.fileNullOrExists}},
It appears to be checking for the existence of a file, not a path prefix. If I create an empty file (e.g. exp/en-fr), and try again, then I get this error:
./onmt/data/Vocabulary.lua:97: dictionary 'exp/en-fr.train_feature_1.dict' not found
This is normal, since there are two feature dictionaries files: one for the source (en-fr.source_feature_1.dict) and one for the target (en-fr.target_feature_1.dict). Adding the -data_type bitext parameter doesn’t help either.
Has anybody else experienced this same issue? Am I doing something wrong?
Thanks!
EDIT: FYI, I’m trying to use BPE and source and target dictionaries generated for a massive data set, and then train a network with smaller parts of the data set in turn (mostly because I can’t even fit the entire data set in RAM).