Run time problem running with PyTorch 1.3 under Windows 10

jmillo · January 2, 2020, 8:52pm

I think the patch is indicated by in the error message and I am currently testing it.
Here is the screen dump:

(ling) D:\AnacondaWork\OpenNMT-py>python train.py -data data/demo -save_model demo-model -world_size 1 -gpu_ranks 0
[2020-01-02 12:15:28,648 INFO] * src vocab size = 24997
[2020-01-02 12:15:28,663 INFO] * tgt vocab size = 35820
[2020-01-02 12:15:28,663 INFO] Building model…
[2020-01-02 12:15:33,147 INFO] NMTModel(
(encoder): RNNEncoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(24997, 500, padding_idx=1)
)
)
)
(rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(35820, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.3, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.3, inplace=False)
(layers): ModuleList(
(0): LSTMCell(1000, 500)
(1): LSTMCell(500, 500)
)
)
(attn): GlobalAttention(
(linear_in): Linear(in_features=500, out_features=500, bias=False)
(linear_out): Linear(in_features=1000, out_features=500, bias=False)
)
)
(generator): Sequential(
(0): Linear(in_features=500, out_features=35820, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2020-01-02 12:15:33,147 INFO] encoder: 16506500
[2020-01-02 12:15:33,147 INFO] decoder: 41613820
[2020-01-02 12:15:33,147 INFO] * number of parameters: 58120320
[2020-01-02 12:15:33,147 INFO] Starting training on GPU: [0]
[2020-01-02 12:15:33,147 INFO] Start training loop and validate every 10000 steps…
[2020-01-02 12:15:33,147 INFO] Loading dataset from data\demo.train.0.pt
[2020-01-02 12:15:33,288 INFO] number of examples: 10000
Traceback (most recent call last):
File “train.py”, line 200, in
main(opt)
File “train.py”, line 86, in main
single_main(opt, 0)
File “D:\AnacondaWork\OpenNMT-py\onmt\train_single.py”, line 139, in main
valid_steps=opt.valid_steps)
File “D:\AnacondaWork\OpenNMT-py\onmt\trainer.py”, line 243, in train
report_stats)
File “D:\AnacondaWork\OpenNMT-py\onmt\trainer.py”, line 357, in _gradient_accumulation
outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt)
File “D:\Anaconda3\envs\ling\lib\site-packages\torch\nn\modules\module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “D:\AnacondaWork\OpenNMT-py\onmt\models\model.py”, line 46, in forward
memory_lengths=lengths)
File “D:\Anaconda3\envs\ling\lib\site-packages\torch\nn\modules\module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “D:\AnacondaWork\OpenNMT-py\onmt\decoders\decoder.py”, line 213, in forward
tgt, memory_bank, memory_lengths=memory_lengths)
File “D:\AnacondaWork\OpenNMT-py\onmt\decoders\decoder.py”, line 395, in run_forward_pass
memory_lengths=memory_lengths)
File “D:\Anaconda3\envs\ling\lib\site-packages\torch\nn\modules\module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “D:\AnacondaWork\OpenNMT-py\onmt\modules\global_attention.py”, line 183, in forward
align.masked_fill(1 - mask, -float(‘inf’))
File “D:\Anaconda3\envs\ling\lib\site-packages\torch\tensor.py”, line 362, in rsub
return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.

(ling) D:\AnacondaWork\OpenNMT-py>

Will the development team apply it or should I submit it?
Regards,
J.M.

francoishernandez · January 3, 2020, 7:57am

Your version is probably not up to date.
This change has already been made a while back for pytorch 1.2 compatibility. (PR #1527)

jmillo · January 17, 2020, 3:31pm

Thanks.
I downloaded the latest version and the bug was corrected.
I noticed that you changed the “façade” programs to simply a call to the main in the “onmt” code with the same name. Will the structure of the py package change in the future or is it quite stable?

francoishernandez · January 17, 2020, 4:05pm

The main scripts indeed were moved to allow that (see https://github.com/OpenNMT/OpenNMT-py/pull/1581), but some dummy scripts were kept at the root of the repo to prevent existing scripts that may call them from breaking.

Apart from this change which was made to allow proper publishing on pip, the structure should remain quite stable for now.