RuntimeError: parallel_for failed: out of memory

lsjiiia · January 25, 2021, 3:16pm

When I run model.translate_batch(batch, beam_size=1, max_decoding_length=7, return_scores=False, replace_unknowns=True), it encounters this error:
RuntimeError: parallel_for failed: out of memory

guillaumekln · January 25, 2021, 3:21pm

How big is the batch input? And what is the maximum sequence length in the batch?

lsjiiia · January 25, 2021, 3:25pm

the parameter is little. batch size is 20-30, maximum sequence length is just 15-20

guillaumekln · January 25, 2021, 3:27pm

Can you describe your system? CPU model, available system memory, etc.

lsjiiia · January 28, 2021, 11:22am

maybe the porblem is high traffic cause big batch_size, I set up a queue before translate_batch to improve concurrency performance

lsjiiia · January 30, 2021, 8:00am

hello, I have another question.
When I set the translation timeout protection and throw an exception，Does ctranslate2 will cause memory leaks?

lsjiiia · January 30, 2021, 9:42am

gpu model, 16G. I load 18 int8 models in one gpu. it cost about 5500MB

guillaumekln · January 30, 2021, 10:11am

No, there should not be any memory leaks. If you find one, please report it with the code to reproduce.

5.5GB for 18 models sounds like a pretty good memory usage.

lsjiiia · January 30, 2021, 10:20am

ok, thanks!

Sujit27 · February 3, 2021, 10:50am

Did you find the exact reason for parallel_for failed: out of memory ? I am also getting the same error for translate_batch. GPU memory is 16 GB and I am loading models ~12 GB. A batch has max 25 sentences and maximum sequence length is 200.

guillaumekln · February 3, 2021, 11:30am

Does the error also happen when you load fewer models on the GPU?