Inference problems with OpenNMT-tf

I’m trying to translate a very large file from a checkpoint but inference gets stuck for a very long time at regular intervals with a message like the following:

921744 predictions are buffered, but waiting for the prediction of line 27698 to advance the output

The lines that raise the problem are not long, so I don’t know what to look for.

Another inference problem is that I cannot apply noise options as inference does not start at all and it throws this error message:

ValueError: unsupported rank 3 for WordNoiser input

Here is the relevant config section:

...
params:
  beam_width: 1
  sampling_topk: 0
  sampling_temperature: 1
  decoding_noise:
    - dropout: 0.1
    - replacement: [0.1, ⦅unk⦆]
    - permutation: 3
...

Using OpenNMT-tf version 2.25.0

This inference log is expected because the input file is sorted by lengths to improve efficiency. The output should be buffered to write the results in the original order.

The log means the next line to be written in the output file is line 27698 but this line is not translated yet. It does not mean the inference is stuck on this specific line, just that it is waiting for this line to be processed.

I will check for this error.

Thanks for the clarification. I would like to make some suggestions for this. I can’t find if that message comes from OpenNMT-tf or TensorFlow code, but it could be a bit more explicit with the “queued” word:

921744 predictions are buffered, but waiting for the prediction of queued line 27698 to advance the output

As it is now, the user assumes that this line is being currently infered but something is wrong and inference is stuck and can’t complete. At least that’s what I thought…

The other suggestion would be to sort the file beforehand if possible, probably in a temp file, and then translate this file and restore the order after inference, in order to keep memory usage under control.

This error should be fixed with:

I made the change in this PR:

Note that this internal buffering and sorting is enabled with auto_config. It can be disabled with:

infer:
  length_bucket_width: 0

I would suggest using CTranslate2 for these large translation tasks. It does not support noisy decoding, but if you really need that it should be relatively easy to write a small postprocessing script.

1 Like

Amazing, thank you!