OpenNMT-py on-the-fly tokenization with subword regularization

I have been trying to use on-the-fly tokenization using SentencePiece with OpenNMT-py. According to the OpenNMT-py tutorial page, I used the following config:

Tokenization options

src_subword_type: sentencepiece
src_subword_model: pah to the SP model
tgt_subword_type: sentencepiece
tgt_subword_model: SP model path

Number of candidates for SentencePiece sampling

subword_nbest: 64

Smoothing parameter for SentencePiece sampling

subword_alpha: 0.1

Specific arguments for pyonmttok

src_onmttok_kwargs: “{‘mode’: ‘none’, ‘spacer_annotate’: True}”
tgt_onmttok_kwargs: “{‘mode’: ‘none’, ‘spacer_annotate’: True}”

src_vocab: path to the SP vocab converted to ONMT format
tgt_vocab: path to the SP vocab converted to ONMT format
overwrite: False

data:
corpus_1:
path_src: Tokenized file path
path_tgt: Tokenized file path
transforms: [onmt_tokenize, filtertoolong]
valid:
path_src: Tokenized file path
path_tgt: Tokenized file path
transforms: [onmt_tokenize, filtertoolong]


But during training, I see the following :
"
corpus_1’s transforms: TransformPipe(ONMTTokenizerTransform(share_vocab=False, src_subword_kwargs={‘sp_model_path’: ‘…’, ‘sp_nbest_size’: 1, ‘sp_alpha’: 0}, src_onmttok_kwargs={‘mode’: ‘none’, ‘spacer_annotate’: True}, tgt_subword_kwargs={‘sp_model_path’: ‘…’, ‘sp_nbest_size’: 1, ‘sp_alpha’: 0}, tgt_onmttok_kwargs={‘mode’: ‘none’, ‘spacer_annotate’: True}), FilterTooLongTransform(src_seq_length=200, tgt_seq_length=200))

"
As it can be seen, it DOES NOT PERFORM any subword regularzation at all !!!
What went wrong?
Any help would be appreciated.

Thanking in advance,

Regards,
Mazida

I think the docs is not fully up to date here.
We expect {src,tgt}_subword_alpha // {src,tgt}_subword_nbest opts instead of “non-sided” subword_alpha // subword_nbest.