Converting pre-trained SavedModel using coremltools

I’m trying to create a proof of concept translation model to use on iOS. The easiest way seems to be using a Core ML model so my goal is to convert one of the pre-trained OpenNMT-tf models - SavedModel (TensorFlow 2.x) - using coremltools.

My first attempt based off of some TensorFlow docs:

import tensorflow as tf
import coremltools as ct

model_dir = '/Users/shawnthroop/Developer/Playgrounds/OpenNMT/averaged-ende-export500k-v2'
model = tf.keras.models.load_model(model_dir)
print(model)
# <keras.saving.saved_model.load.TransformerBase object at 0x1570f4910>

mlmodel = ct.convert(model, source='tensorflow')
# NotImplementedError: Expected model format: [SavedModel | [concrete_function] | tf.keras.Model | .h5], got <keras.saving.saved_model.load.TransformerBase object at 0x12d185d60>

I added in the print statement to highlight my confusion because ct.convert is expecting a format I thought the model was already in. How do I get a valid SavedModel from a the pretrained model?

I tried another angle…

Inspiration for my second attempt comes from this forum post trying to do a very similar thing.

import os
import tensorflow as tf
import opennmt
import coremltools as ct

model_dir = '/Users/shawnthroop/Developer/Playgrounds/OpenNMT/averaged-ende-export500k-v2'
model_vocab = os.path.join(checkpoint_dir, "assets/wmtende.vocab")
model = opennmt.models.TransformerBase()
model.initialize({
    "source_vocabulary": model_vocab,
    "target_vocabulary": model_vocab,
})

checkpoint = tf.train.Checkpoint(model=model)
checkpoint.restore(tf.train.latest_checkpoint(model_dir))

serve_function = model.serve_function().get_concrete_function()
mlmodel = coremltools.convert([serve_function], source="tensorflow")

# InvalidArgumentError: input resource[0] expected type resource != float, the type of transformer_base_9_while_word_embedder_19_embedding_lookup_113121_0[0]
	In {{node transformer_base_9/while/word_embedder_19/embedding_lookup}}

I’m really off the deep end on what’s going wrong here. Any thoughts or insights would be welcome.

1 Like

You should try enabling the TensorFlow Lite mode when building the concrete function. This mode tries to avoid using ops that are not compatible with TensorFlow Lite.

with model.enable_tflite_mode():
    serve_function = model.serve_function().get_concrete_function()

Note that we are not actively supporting this conversion mode, so it will probably be difficult to make it work effectively. In particular this only used to work with TensorFlow 2.6, but there are errors with newer TensorFlow versions.

Alternatively, you can consider compiling the inference engine CTranslate2 for iOS.

1 Like

Thank you for the feedback and ideas. I just went on holidays so I haven’t had a chance to open my machine and retry some of things.

I’ve been reading the CTranslate2 documentation and I haven’t found anything that talks about compiling for iOS. I see a lot of code using the Python API but because of iOS’ limitations I don’t think I will be an option.

My main goal is to be able to use a pre/self-trained model, integrate a tokenizer like SentencePiece, and then convert it to Apple’s CoreML model format so I can interact with it on iOS.

Since SentencePiece isn’t available to me on iOS devices the serve_function seems like the way to go if I want to integrate the tokenizer.

Am I thinking about this in the correct way?

Python is the easiest way to use this project, but CTranslate2 is first a C++ library that can be compiled to many platforms. There is no documentation specific to iOS but it is a solution to consider if you are familiar with compiling C/C++ projects to iOS.

Does Apple CoreML support the SentencePiece tokenizer embedded in the model graph? I can’t find information about this. If CoreML does not support these tokenization operators, you would still need to integrate SentencePiece another way.

In theory yes, but as far as I know there are rough edges and incomplete support for translation models in tools like CoreML. Even if you manage to convert the model, there are usually other limitations at runtime like no support for batch decoding and fixed decoding length.