Excessive memory usage in Opennmt-tf transformer inference?


I am new to openNMT and am experimenting with the transformer model in opennmt-tf. I trained with the default transformer model and the transformer_1gpu.yml settings provided.

Now I am trying to run inference with the default settings (batch_size = 32) and I sorted the input according to the number of tokens in the input as suggested in this thread (Low performance on Inference with the transformer model on single GPU). However, the system would stop responding within a few minutes of starting the inference. As a test, I trimmed the input file to only have one input sentence with a single token, but I still see the memory of the python process quickly climb and exhaust all 15G of available memory at which point the system becomes unresponsive. I am using a Google Compute Engine instance with 15G of memory, 4 vCPUs and an Nvidia Tesla K80 with about 12G of memory.

Is this behavior expected for inference even with a single input sentence? Any tips to resolve the issue would be appreciated.



Inference should be relatively light on system memory (I just tested the pretrained English-German Transformer model and it consumes about 2GB in system memory).

Could you describe the TensorFlow and OpenNMT-tf versions you used and the command line you executed?



Thank you for the prompt response. I had installed Tensorflow a month or so ago. I upgraded it to the latest version available on pip (1.8.0) and the issue went away. Perhaps it was a problem with the old tensorflow version. I should have tried this before posting. My apologies.

Thanks again for the quick response.

For reference, which TensorFlow version did you use before updating?

The machine had Tensorflow 1.5.0. Upgrading to 1.8.0 resolved the problem.