CUDA out of memory when training the model

Hello,

Hi all, I’ve just started with OpenNMT-py and I’m at a point to train the test model following the instructions here (http://opennmt.net/OpenNMT-py/quickstart.html).

By running

python train.py -data data/demo -save_model demo-model

the CPU is used.

By running

python3 train.py -data data/demo -save_model demo-model -gpu_ranks 0

GPU is used, but I get this error:

RuntimeError: CUDA out of memory. Tried to allocate 279.88 MiB (GPU 0; 1.95 GiB total capacity; 736.80 MiB already allocated; 105.88 MiB free; 23.08 MiB cached)

Specs:
Ubuntu 18.04 + opennmt-py (0.8.2)
NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1] (rev a2) (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company GM107GLM [Quadro M1000M] [103c:810a]
Flags: bus master, fast devsel, latency 0, IRQ 149
Memory at d3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 3000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting

Can someone advise how to resolve this issue?

Here is the entire log:
[2019-03-26 14:55:47,213 INFO] * src vocab size = 24997
[2019-03-26 14:55:47,213 INFO] * tgt vocab size = 35820
[2019-03-26 14:55:47,213 INFO] Building model…
[2019-03-26 14:55:50,304 INFO] NMTModel(
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(24997, 500, padding_idx=1)
)
)
)
(rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(35820, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.3)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3)
(layers): ModuleList(
(0): LSTMCell(1000, 500)
(1): LSTMCell(500, 500)
)
)
(attn): GlobalAttention(
(linear_in): Linear(in_features=500, out_features=500, bias=False)
(linear_out): Linear(in_features=1000, out_features=500, bias=False)
)
)
(generator): Sequential(
(0): Linear(in_features=500, out_features=35820, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-03-26 14:55:50,304 INFO] encoder: 16506500
[2019-03-26 14:55:50,304 INFO] decoder: 41613820
[2019-03-26 14:55:50,304 INFO] * number of parameters: 58120320
[2019-03-26 14:55:50,306 INFO] Starting training on GPU: [0]
[2019-03-26 14:55:50,306 INFO] Start training loop and validate every 10000 steps…
[2019-03-26 14:55:50,378 INFO] Loading dataset from data/demo.train.0.pt, number of examples: 10000
Traceback (most recent call last):
File “train.py”, line 109, in
main(opt)
File “train.py”, line 39, in main
single_main(opt, 0)
File “/home/sebastjan/projekti/opennmt/OpenNMT-py/onmt/train_single.py”, line 116, in main
valid_steps=opt.valid_steps)
File “/home/sebastjan/projekti/opennmt/OpenNMT-py/onmt/trainer.py”, line 209, in train
report_stats)
File “/home/sebastjan/projekti/opennmt/OpenNMT-py/onmt/trainer.py”, line 329, in _gradient_accumulation
trunc_size=trunc_size)
File “/home/sebastjan/projekti/opennmt/OpenNMT-py/onmt/utils/loss.py”, line 159, in call
loss.div(float(normalization)).backward()
File “/home/sebastjan/.local/lib/python3.6/site-packages/torch/tensor.py”, line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/home/sebastjan/.local/lib/python3.6/site-packages/torch/autograd/init.py”, line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 113.75 MiB (GPU 0; 1.95 GiB total capacity; 625.72 MiB already allocated; 57.00 MiB free; 23.28 MiB cached)

EDIT
Both tests under https://pytorch.org/get-started/locally/#linux-verification also pass

Thank you.
Sebastjan

Hi,

2GB of GPU memory is usually not enough, even for the quickstart training. If you still want to proceed, you should reduce some options like the batch size and/or the model size.

1 Like

Thank you, we have now moved to a machine with more resources and it works fine :slightly_smiling_face:

Best
Sebastjan

I have a 4GB Gpu but this problem occurs also for me.

Note : I have 70k parallel corpus for training

Thanks beforehand