When I run model.translate_batch(batch, beam_size=1, max_decoding_length=7, return_scores=False, replace_unknowns=True), it encounters this error:
RuntimeError: parallel_for failed: out of memory
How big is the batch
input? And what is the maximum sequence length in the batch?
the parameter is little. batch size is 20-30, maximum sequence length is just 15-20
Can you describe your system? CPU model, available system memory, etc.
maybe the porblem is high traffic cause big batch_size, I set up a queue before translate_batch to improve concurrency performance
hello, I have another question.
When I set the translation timeout protection and throw an exception,Does ctranslate2 will cause memory leaks?
gpu model, 16G. I load 18 int8 models in one gpu. it cost about 5500MB
No, there should not be any memory leaks. If you find one, please report it with the code to reproduce.
5.5GB for 18 models sounds like a pretty good memory usage.
ok, thanks!
Did you find the exact reason for parallel_for failed: out of memory ? I am also getting the same error for translate_batch. GPU memory is 16 GB and I am loading models ~12 GB. A batch has max 25 sentences and maximum sequence length is 200.
Does the error also happen when you load fewer models on the GPU?