How do you solve the graphic memory overflow?

sdlmw · January 15, 2019, 7:31am

When I started the engine with the th Tools/rest_translation_server.lua command, Each translation results in an increase in memory, When the graphic memory is full, The engine can’t be used.

Has anyone ever had the same problem as me? How is it resolved?

Thanks

guillaumekln · January 16, 2019, 9:22am

What is the request size you are sending to the server (batch size, sequence length)?

sdlmw · January 18, 2019, 6:07am

Hello guillaumekln,

The result I can see here is that there’s a lot of data. I don’t have a specific value for the exact number. So after a few days, the engine will die. I think it’s too much trouble to reboot manually every time after the engine goes down. Whether the program can automatically empty the cache, just like the SQL Server caching mechanism.
Or permanently turn off the cache feature

Thanks

guillaumekln · January 18, 2019, 7:53am

Is the client constraining the request size? If not, the memory usage increases if the request size is larger than any requests seen previously.

sdlmw · January 23, 2019, 7:53am

No request restrictions were made. But why can’t we turn off this caching mechanism?

guillaumekln · January 23, 2019, 8:01am

There are 2 levels of cache:

the CUDA allocator
the internal buffers of the model (e.g. the output of each layer)

For 1., I already indicated to you to set THC_CACHING_ALLOCATOR=0.

For 2., these buffers are not freed for performance reasons and can only grow in size. You can manually release them by calling this method on the main model object: https://github.com/torch/nn/blob/master/doc/module.md#clearstate.