Problems in training train.py

Victor · August 8, 2017, 5:45pm

Hello there, when I tried to train the model following the instruction on github on the GPU machine with following command:
python train.py -data data/demo.train.pt -save_model demo-model

I got errors as follow:
Namespace(attention_type=‘general’, batch_size=64, brnn=False, brnn_merge=‘concat’, context_gate=None, copy_attn=False, coverage_attn=False, curriculum=False, data=‘data/demo.train.pt’, decay_method=’’, decoder_layer=‘rnn’, dropout=0.3, encoder_layer=‘rnn’, encoder_type=‘text’, epochs=13, experiment_name=’’, extra_shuffle=False, feat_merge=‘concat’, feat_vec_exponent=0.7, feat_vec_size=20, gpus=[], input_feed=1, lambda_coverage=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, log_server=’’, max_generator_batches=32, max_grad_norm=5, optim=‘sgd’, param_init=0.1, position_encoding=False, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, rnn_type=‘LSTM’, save_model=‘demo-model’, seed=-1, share_decoder_embeddings=False, start_checkpoint_at=0, start_decay_at=8, start_epoch=1, train_from=’’, train_from_state_dict=’’, truncated_decoder=0, warmup_steps=4000, word_vec_size=500)
WARNING: You have a CUDA device, should run with -gpus 0
Loading data from ‘data/demo.train.pt’

vocabulary size. source = 24999; target = 35820
number of training sentences. 10000
maximum batch size. 64
Building model…
Intializing params
number of parameters: 58121320
(‘generator.0.weight’, 17910000)
(‘generator.0.bias’, 35820)
('encoder: ', 16507500)
('decoder: ', 23668000)
Traceback (most recent call last):
Python File train, line 465, in
main()
Python File “train”, line 461, in main
trainModel(model, trainData, validData, dataset, optim)
Python File “train”, line 220, in trainModel
os.mkdir(model_dirname)
OSError: [Errno 2] No such file or directory: ‘’

I googled and tried for a long time, but never got it. Would you guys please help me solving the problem? Thank you very much!

Shruti · August 8, 2017, 10:48pm

Looks like this variable model_dirname may be the problem. Did you check what its value is, and if its assigned?