OpenNMT Speech2Text example

Hi all,
I am trying to train the model in the Speech2Text example (http://opennmt.net/OpenNMT-py/speech2text.html#quick-start) in OpenNMT-py. However, when I run I get the following error message:

Traceback (most recent call last):
File “train.py”, line 109, in <module>
main(opt)
File “train.py”, line 39, in main
single_main(opt, 0)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/train_single.py”, line 116, in main
valid_steps=opt.valid_steps)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/trainer.py”, line 209, in train
report_stats)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/trainer.py”, line 318, in _gradient_accumulation
outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/models/model.py”, line 42, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/encoders/audio_encoder.py”, line 113, in forward
memory_bank, tmp = rnn(packed_emb)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 182, in forward
self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #4 ‘mat1’

What do I need to change there?

Viele Grüße,
Andi

provide more info on your set up.
pytorch version, torchtext, gpu …
then the command line syou used.

I am using the torch version 1.0.1.post2. I am using CUDA 9.2 and I am following the example from http://opennmt.net/OpenNMT-py/speech2text.html. The GPU is Tesla K80.

The command is
python train.py -model_type audio -enc_rnn_size 512 -dec_rnn_size 512 -audio_enc_pooling 1,1,2,2 -dropout 0 -enc_layers 4 -dec_layers 1 -rnn_type LSTM -data data/speech/demo -save_model demo-model -global_attention mlp -gpu_ranks 0 -batch_size 8 -optim adam -max_grad_norm 100 -learning_rate 0.0003 -learning_rate_decay 0.8 -train_steps 100000

The torch output is
[2019-02-20 07:56:05,139 INFO] * tgt vocab size = 31
[2019-02-20 07:56:05,140 INFO] Building model…
[2019-02-20 07:56:07,910 INFO] NMTModel(
(encoder): AudioEncoder(
(W): Linear(in_features=512, out_features=512, bias=False)
(batchnorm_0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_0): LSTM(161, 512)
(pool_0): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(rnn_1): LSTM(512, 512)
(pool_1): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(batchnorm_1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_2): LSTM(512, 512)
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_3): LSTM(512, 512)
(pool_3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(31, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.0)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.0)
(layers): ModuleList(
(0): LSTMCell(1012, 512)
)
)
(attn): GlobalAttention(
(linear_context): Linear(in_features=512, out_features=512, bias=False)
(linear_query): Linear(in_features=512, out_features=512, bias=True)
(v): Linear(in_features=512, out_features=1, bias=False)
(linear_out): Linear(in_features=1024, out_features=512, bias=True)
)
)
[2019-02-20 07:56:07,911 INFO] encoder: 7952384
[2019-02-20 07:56:07,911 INFO] decoder: 4206763
[2019-02-20 07:56:07,911 INFO] * number of parameters: 12159147
[2019-02-20 07:56:07,912 INFO] Starting training on GPU: [0]
[2019-02-20 07:56:07,912 INFO] Start training loop and validate every 10000 steps…
[2019-02-20 07:56:07,938 INFO] Loading dataset from data/speech/demo.train.0.pt, number of examples: 300
Traceback (most recent call last):
File “train.py”, line 109, in
main(opt)
File “train.py”, line 39, in main
single_main(opt, 0)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/train_single.py”, line 116, in main
valid_steps=opt.valid_steps)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/trainer.py”, line 209, in train
report_stats)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/trainer.py”, line 318, in _gradient_accumulation
outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/models/model.py”, line 42, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/userdata/andi/GitClone/OpenNMT-py/onmt/encoders/audio_encoder.py”, line 113, in forward
memory_bank, tmp = rnn(packed_emb)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/anaconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 182, in forward
self.num_layers, self.dropout, self.training, self.bidirectional)