Through the new hook mechanism:
I introduced a hook for Google’s SentencePiece - integration with the training/inference workflow is normally seamless, see documentation here.
SentencePiece is an alternative sentence level tokenisation model described here:
The author reports interesting results - competitive with BPE for several languages especially Chinese, Japanese, Korean. This tokenisation schema can combine with normal OpenNMT tokenisations including BPE…
Please share if you have interesting results.