CTranslate2 Model for OpenNMT-py Model Giving Different Results

Hi,

I converted an OpenNMT-py model to a ctranslate2 model using the following:

model = torch.load(model_path, map_location=torch.device('cpu'))
model_spec = get_ctranslate2_model_spec(model["opt"])
converter = ctranslate2.converters.OpenNMTPyConverter(model_path)
converter.convert(output_dir, model_spec)

But the converted model and the original model are giving different results.

Command used for training the original model:

python3 ./OpenNMT-py/train.py -save_model model/transformer \
           -save_checkpoint_steps 1000 \
           -data data/data \
           -layers 4 \
           -rnn_size 512 \
           -word_vec_size 512 \
           -max_grad_norm 0 \
           -optim adagrad \
           -encoder_type transformer \
           -decoder_type transformer \
           -position_encoding \
           -dropout 0.0 \
           -attention_dropout 0.0 \
           -learning_rate 0.15 \
           -adagrad_accumulator_init 0.1 \
           -batch_size 64 \
           -train_steps 75000 \
           -share_embeddings \
           -copy_attn \
           -reuse_copy_attn \
           -copy_loss_by_seqlength \
           -bridge \
           -seed 777 \
           -world_size 1 \
           -gpu_ranks 0

Any pointers to why the translations are different in the two models?

How different are they? Do you use the same beam size in translation?

In general we can’t guarantee the translations to always match. There could be small differences due to implementation details.

Hi,

We do not pass any beam size and use the default one (assuming it is the same for both the APIs).

A sample of the output difference:
Input: John Major ||| 1992 election ||| what was he running for ?
Output using OpenNMT-py model: what was John Major running for ?
Output using OpenNMT-ctranslate2 model: What political commentator Bill Edwards ’ second time had ?

The outputs are completely different and is true for the samples we have tested.

Just saw in your training command line that you are using unsupported options like copy attention or bridge.

CTranslate2 only support architectures that are described in the papers listed here: https://github.com/OpenNMT/CTranslate2#converting-models

See this example to train standard Transformer models: https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model

Thanks for your response!

Is there another way to invoke OpenNMT-py models in C++ since I am unable to use ctranslate2 here?

Is there a reason you don’t train a standard Transformer architecture?

The other ways to use C++ are just more complicated than retraining a compatible model.

For the problem we are working on, Copy Mechanism is critical.

So it’d be really great if we can have CTranslate2 support for Copy Mechanism and at least one BRNN architecture OR have a way of using the OpenNMT-py model in a C++ environment.

You can always call Python code from C++. See for example:

You can also look into the PyTorch C++ API but that likely requires a lot more work as the Python logic has to be translated in C++:

https://pytorch.org/cppdocs/

As a follow up, do you have a plan to add the copy mechanism support to Ctranslate2?

This is not planned at the moment.