Thanks to reply my issue. I am working on a GTX 1080 Ti and this branch size uses max 4GB of memory. The interesting thing is if I retry translation it is working fine. It will translate 4-5 times, than I gave this error.
Might it be a lua garbage collection issue (I am absolutely new in lua, I just google out this question)?
Could it be a cuda9 issue? I tried the same environment in ubuntu and it works fine for me, but I have the same issue in more than one Centos7 systems.
Hello László, for reference, I can reproduce that behaviour on one our servers - it appeared recently and wonder too if it can be connected to cuda 9. We will try to narrow it down.
best
Jean
For what it’s worth, I have the same issue. It can occur either for preprocessing, training or translating. It is not reproducible for me (i.e. if I rerun the exact same script after an error it usually works). Would be nice if this was fixed, I often run full pipelines (preprocess, train, translate) for multiple parameter settings, but due to some of them fail sometimes.
I have the following situation and I would like to ask could it be the case of my issue? I ran multiple trainings in my server (2 training on 1-1 GPU) and as I see they have allocated all my available virtual memory space. Now I gave this issue when I start translation all the time. Furthermore I tried to run cuda samples and it gave me same error message. What do you think, could it be my problem? I don’t want to stop my training now, to test my theory but as soon as I can I will write the result.
Hi László, sorry about non-response on that. On our side, we did re-install our server (cuda libraries/driver), and the issue went away. Can you share exact version of driver/libraries?