GPU is not used

dquan · October 16, 2018, 12:26am

Hey there guillaumekln, I’m trying to run a model using a GPU and I saw from this post that to do so I have to call the -gpuid feature with any ID that’s not 0. I dont get any error when I run the model using the -gpuid with ID being 1 or 2 but I realized that my GPU isnt being used when I verifying with the nvidia-smi command. Im also using the pytorch version. Any pointers?

Thanks

guillaumekln · October 16, 2018, 7:40am

Hi,

What is the command line you used?

dquan · October 16, 2018, 3:28pm

This is the command line that I used.

python train.py -data cmn-eng/demo -save_model bidir_2layer_500u -src_word_vec_size 500 -tgt_word_vec_size 500 -encoder_type brnn -decoder_type rnn -rnn_size 500 -enc_layers 2 -dec_layers 2 -rnn_type LSTM -global_attention general -batch_size 64 -optim adam -adam_beta1 .9 -adam_beta2 .999 -dropout .4 -learning_rate .001 -report_every 200 -train_steps 20000 -gpuid 1

I did see from the documentation that -gpuid is deprecated and I should use -world_size and -gpu_ranks. This is my first time using a GPU to train a model so im not entirely sure what these two features do.

dquan · October 16, 2018, 9:04pm

guillaumekln,

I was able to get it to work. Nonetheless, thank you for the fast response.

Best,
Dillon

guillaumekln · October 17, 2018, 6:38am

Can you share what you find and how you fixed it for future users? Thanks.

dquan · October 18, 2018, 9:23pm

Yeah sure. The big thing that I needed to figure out was what world_size and gpu_ranks meant. I rented out AWS p2.xlarge services that provides 1 GPU and wanted to train a NMT model with about 10mil dataset. According to this post here, to use my GPU, I would first need to set which GPU to use with the command export CUDA_VISIBLE_DEVICES. Once this is done, when writing the command line to train the model, world_size corresponds to the number of GPUs to use and gpu_rank refers to the order in which you want them to be used. Since I had 1 GPU, i had -world_size 1 -gpu_ranks 0 since CUDA_VISIBLE_DEVICES enumerates your GPU with 0. More can be found with this link here.

https://training.acceleware.com/blog/cudavisibledevices-masking-gpus

vince62s · October 18, 2018, 9:27pm

all in here:

github.com

OpenNMT/OpenNMT-py/blob/master/docs/source/FAQ.md

# FAQ

## How do I use Pretrained embeddings (e.g. GloVe)?

Using vocabularies from OpenNMT-py preprocessing outputs, `embeddings_to_torch.py` to generate encoder and decoder embeddings initialized with GloVe�s values.

the script is a slightly modified version of ylhsieh�s one2.

Usage:

```
embeddings_to_torch.py [-h] -emb_file EMB_FILE -output_file OUTPUT_FILE -dict_file DICT_FILE [-verbose]

emb_file: GloVe like embedding file i.e. CSV [word] [dim1] ... [dim_d]

output_file: a filename to save the output as PyTorch serialized tensors2

dict_file: dict output from OpenNMT-py preprocessing
```

This file has been truncated. show original