SentencePieceTokenizer is not in list of accepted tokenizers

Mongoose · June 18, 2022, 12:31pm

When converting my trained checkpoint into a SavedModel, I am getting an error message that SentencePieceTokenizer is not an acceptable tokenizer. However, I am pretty confident that in the past I did manage to make SentencePiece part of the model’s graph.

Current version: OpenNMT-tf 2.27.1

Error message:

ValueError: SentencePieceTokenizer is not in list of accepted tokenizers: CharacterTokenizer, OpenNMTTokenizer, SpaceTokenizer

Here is the tokenizers’ parameters in the data section of the config file:

data:
  source_tokenization:
    type: SentencePieceTokenizer
    params:
      model: /src/xp/model_1/vocabs/src-bpe.sp.40k.model
  target_tokenization:
    type: SentencePieceTokenizer
    params:
      model: /src/xp/model_1/vocabs/tgt-bpe.sp.40k.model

guillaumekln · June 20, 2022, 8:25am

You should install the optional package tensorflow-text.

Mongoose · June 23, 2022, 6:49pm

tensorflow-text does solve the issue. In my case though, the culprit was installing tensorflow using conda. If tensorflow is installed via conda, then tensorflow-text cannot be imported properly.
I solved it by installing tensorflow via pip, not conda.