In this example of OpenNMT-tf Serving I can see that we are using a single/same “wmtende.model” SentencePiece model for tokenization and de-tokenization.
Shouldn’t there be two separate SentencePiece models for source language tokenization and translation model output de-tokenization? If no, how to create a single SentencePiece model for both source and target language tokenization/de-tokenization?
I’m currently training a model to use with the Docker version of OpenNMT-tf serving and basically followed that script substituting my own data. I concatenated the source and target data and then used SentencePiece to tokenize it with the BPE option. What I realised after a few errors is that we need to use the built-from-sources C++ version of SentencePiece to access the full range of SP commands.