OpenNMT Forum

How do you solve the graphic memory overflow?


(李明伟) #1

When I started the engine with the th Tools/rest_translation_server.lua command, Each translation results in an increase in memory, When the graphic memory is full, The engine can’t be used.

Has anyone ever had the same problem as me? How is it resolved?


(Guillaume Klein) #2

What is the request size you are sending to the server (batch size, sequence length)?

(李明伟) #3

Hello guillaumekln,

The result I can see here is that there’s a lot of data. I don’t have a specific value for the exact number. So after a few days, the engine will die. I think it’s too much trouble to reboot manually every time after the engine goes down. Whether the program can automatically empty the cache, just like the SQL Server caching mechanism.
Or permanently turn off the cache feature


(Guillaume Klein) #4

Is the client constraining the request size? If not, the memory usage increases if the request size is larger than any requests seen previously.

(李明伟) #5

No request restrictions were made. But why can’t we turn off this caching mechanism?

(Guillaume Klein) #6

There are 2 levels of cache:

  1. the CUDA allocator
  2. the internal buffers of the model (e.g. the output of each layer)

For 1., I already indicated to you to set THC_CACHING_ALLOCATOR=0.

For 2., these buffers are not freed for performance reasons and can only grow in size. You can manually release them by calling this method on the main model object: