I am training speech to text model on OpenNMT-py.
I used MFCC algo at preprocess level. But unable to start training.
python3 train.py -model_type audio -enc_rnn_size 1024 -dec_rnn_size 1024 -audio_enc_pooling 1,1,1,2,2,2 -dropout 0.1 -enc_layers 6 -dec_layers 4 -rnn_type LSTM -data data/speech/demofiles-vctk-mfcc/demo -save_model models/exp-vctk-mfcc/demo-model-vctk-mfcc -global_attention mlp -batch_size 6 -optim sgd -max_grad_norm 100 -decay_method noam -train_steps 10000 -encoder_type brnn -decoder_type rnn -bridge -window_size 0.025 -image_channel_size 1
After running this at time of data loading,
[2019-10-29 14:42:49,730 INFO] * tgt vocab size = 9386
[2019-10-29 14:42:49,732 INFO] Building model…
/home/amit/.local/lib/python3.5/site-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
“num_layers={}”.format(dropout, num_layers))
[2019-10-29 14:42:50,667 INFO] NMTModel(
(encoder): AudioEncoder(
(dropout): Dropout(p=0.1, inplace=False)
(W): Linear(in_features=1024, out_features=1024, bias=False)
(batchnorm_0): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_0): LSTM(26, 512, dropout=0.1, bidirectional=True)
(pool_0): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(rnn_1): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_1): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(batchnorm_1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_2): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_2): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
(batchnorm_2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_3): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_4): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_4): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_4): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(rnn_5): LSTM(1024, 512, dropout=0.1, bidirectional=True)
(pool_5): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(batchnorm_5): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(decoder): InputFeedRNNDecoder(
(embeddings): Embeddings(
(make_embedding): Sequential(
(emb_luts): Elementwise(
(0): Embedding(9386, 500, padding_idx=1)
)
)
)
(dropout): Dropout(p=0.1, inplace=False)
(rnn): StackedLSTM(
(dropout): Dropout(p=0.1, inplace=False)
(layers): ModuleList(
(0): LSTMCell(1524, 1024)
(1): LSTMCell(1024, 1024)
(2): LSTMCell(1024, 1024)
(3): LSTMCell(1024, 1024)
)
)
(attn): GlobalAttention(
(linear_context): Linear(in_features=1024, out_features=1024, bias=False)
(linear_query): Linear(in_features=1024, out_features=1024, bias=True)
(v): Linear(in_features=1024, out_features=1, bias=False)
(linear_out): Linear(in_features=2048, out_features=1024, bias=True)
)
)
(generator): Sequential(
(0): Linear(in_features=1024, out_features=9386, bias=True)
(1): Cast()
(2): LogSoftmax()
)
)
[2019-10-29 14:42:50,668 INFO] encoder: 34770944
[2019-10-29 14:42:50,668 INFO] decoder: 54146226
[2019-10-29 14:42:50,668 INFO] * number of parameters: 88917170
[2019-10-29 14:42:50,669 INFO] Starting training on CPU, could be very slow
[2019-10-29 14:42:50,669 INFO] Start training loop and validate every 10000 steps…
[2019-10-29 14:42:50,669 INFO] Loading dataset from data/speech/demofiles-vctk-mfcc/demo.train.0.pt
[2019-10-29 14:42:50,923 INFO] number of examples: 5000
Traceback (most recent call last):
File “train.py”, line 200, in
main(opt)
File “train.py”, line 88, in main
single_main(opt, -1)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/train_single.py”, line 143, in main
valid_steps=opt.valid_steps)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/trainer.py”, line 243, in train
report_stats)
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/trainer.py”, line 392, in _gradient_accumulation
self.optim.step()
File “/home/amit/Desktop/amit/OpenNMT-py/onmt/utils/optimizers.py”, line 360, in step
self. optimizer.step()
File “/home/amit/.local/lib/python3.5/site-packages/torch/optim/sgd.py”, line 106, in step
p.data.add (-group[‘lr’], d_p)
RuntimeError: value cannot be converted to type float without overflow: (-2.42042e-22,3.95285e-06)