CTranslate2 Model for OpenNMT-py Model Giving Different Results

bhasin · August 11, 2020, 6:50pm

Hi,

I converted an OpenNMT-py model to a ctranslate2 model using the following:

model = torch.load(model_path, map_location=torch.device('cpu'))
model_spec = get_ctranslate2_model_spec(model["opt"])
converter = ctranslate2.converters.OpenNMTPyConverter(model_path)
converter.convert(output_dir, model_spec)

But the converted model and the original model are giving different results.

Command used for training the original model:

python3 ./OpenNMT-py/train.py -save_model model/transformer \
           -save_checkpoint_steps 1000 \
           -data data/data \
           -layers 4 \
           -rnn_size 512 \
           -word_vec_size 512 \
           -max_grad_norm 0 \
           -optim adagrad \
           -encoder_type transformer \
           -decoder_type transformer \
           -position_encoding \
           -dropout 0.0 \
           -attention_dropout 0.0 \
           -learning_rate 0.15 \
           -adagrad_accumulator_init 0.1 \
           -batch_size 64 \
           -train_steps 75000 \
           -share_embeddings \
           -copy_attn \
           -reuse_copy_attn \
           -copy_loss_by_seqlength \
           -bridge \
           -seed 777 \
           -world_size 1 \
           -gpu_ranks 0

Any pointers to why the translations are different in the two models?

guillaumekln · August 12, 2020, 6:49am

How different are they? Do you use the same beam size in translation?

In general we can’t guarantee the translations to always match. There could be small differences due to implementation details.

bhasin · August 12, 2020, 6:36pm

Hi,

We do not pass any beam size and use the default one (assuming it is the same for both the APIs).

A sample of the output difference:
Input: John Major ||| 1992 election ||| what was he running for ?
Output using OpenNMT-py model: what was John Major running for ?
Output using OpenNMT-ctranslate2 model: What political commentator Bill Edwards ’ second time had ?

The outputs are completely different and is true for the samples we have tested.

guillaumekln · August 12, 2020, 7:14pm

Just saw in your training command line that you are using unsupported options like copy attention or bridge.

CTranslate2 only support architectures that are described in the papers listed here: https://github.com/OpenNMT/CTranslate2#converting-models

See this example to train standard Transformer models: https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model

bhasin · August 12, 2020, 8:22pm

Thanks for your response!

Is there another way to invoke OpenNMT-py models in C++ since I am unable to use ctranslate2 here?

guillaumekln · August 13, 2020, 7:04am

Is there a reason you don’t train a standard Transformer architecture?

The other ways to use C++ are just more complicated than retraining a compatible model.

bhasin · August 13, 2020, 5:39pm

For the problem we are working on, Copy Mechanism is critical.

So it’d be really great if we can have CTranslate2 support for Copy Mechanism and at least one BRNN architecture OR have a way of using the OpenNMT-py model in a C++ environment.

guillaumekln · August 13, 2020, 5:52pm

You can always call Python code from C++. See for example:

You can also look into the PyTorch C++ API but that likely requires a lot more work as the Python logic has to be translated in C++:

https://pytorch.org/cppdocs/

bhasin · August 28, 2020, 8:53pm

As a follow up, do you have a plan to add the copy mechanism support to Ctranslate2?

guillaumekln · August 31, 2020, 7:36am

This is not planned at the moment.