Support building vocabs from multiple files

It would be nice to have the option of building vocabs on the fly for directory mode of both preprocess.lua and train.lua.

The current workaround is to create new source & target files (extra disk usage) by concatenating all of the files in the train_dir, then running build_vocab on the big files, then removing the big files once they’re no longer needed…