Just using the sentencepiece hook in train.lua doesn’t work (because it’s a tokenize hook, and train.lua doesn’t recognize the -sentencepiece option that must accompany the hook).
It seems like tokenizer options in train.lua are denoted as -tok_*, so something like -tok_hook_file and -tok_hook_options (someplace to put “-sentencepiece foo.model”) would be very useful.
Sorry for the undocumented behavior: it is required that hook_file appears on the command line. So you just need to move hook_file out of the configuration file and back to the command line.
The reason is that we peek at the command line options before parsing them to include the options declared by the hook. Reading the configuration file currently happens after the parsing.