I am using Titan V for training. I think there is a Problem with the command I am using.
If I use CPU with below command
“python train.py -src_word_vec_size 200 -tgt_word_vec_size 200 -data data/model -save_model sum_eng-model -batch_size 64 -valid_steps 5000 -train_steps 100000 -report_every 50”
Traning start with CPU, slow but fine.
But when I try with GPU, with below command
“python train.py -src_word_vec_size 200 -tgt_word_vec_size 200 -data data/model -save_model sum_eng-model -save_checkpoint_steps 100 -world_size 2 -gpu_ranks 1 -batch_size 32 -valid_steps 1000 -train_steps 100000 -report_every 1”
Here I even reduced the batch size, checkpoint step and report step, but nothing happens, even description of the model is not showing. (I have GTX 1070 as GPU 0, so using -gpu_ranks 1 for titan)
Am I using right command???
Many thanks in advance.
Hi Yasmin, Thanks for your reply, Actually I have 2 GPU, wants to start with 2nd, so I wrote -gpu_ranks 1.
You mean to say I should add CUDA_VISIBLE_DEVICES=1, like,
but same, nothing happed, even the description of model is not even showing. waited for 30mins.
I checked with “nvidia-smi” command, its showing me both GPUs, cuda 9 is also availabe.
Any other solution??
Hi Yasmin,
Is it possible to train a model with intel GPU? And if so what changes should one make on the cofig file at the rest server API?
Kind regards.