Hi everyone,
I’ve been trying to train an English-Spanish model using the TED IWSLT 2016 dataset. My training and development sets consist of 218845 and 873 samples respectively. I set it to training for ~30 epochs with batch size 64 (102583 steps) with the following command:
python train.py -data data/$dataset -save_model models/$model -coverage_attn -word_vec_size 512 -layers 3 -rnn_size 512 -rnn_type LSTM -encoder_type brnn -batch_size 64 -dropout 0.25 -input_feed 1 -global_attention mlp -optim adam -learning_rate 0.0001 -gpu_ranks 0 -train_steps 102583
I’ve tokenized the dataset using BPE with 20k steps with joint vocabulary. OpenNMT reports the vocab sizes as 17138 for source (English) and 21763 as target (Spanish).
While training, validation accuracy increases from 81.297 to 83.38 and perplexity decreases from 1.81916 to 1.7422 at first and last epochs. Here’s the whole training log: https://pastebin.com/cUfkgsLh
On inference I can’t get any meaningful results. It is only predicting a few characters as in:
SENT 1: ['thank', 'you', 'so', 'much', ',', 'chris', '.']
PRED 1: l a
PRED SCORE: -0.5345
SENT 2: ['i', 'have', 'been', 'blown', 'away', 'by', 'this', 'conference', '.']
PRED 2: l a
PRED SCORE: -0.3953
SENT 3: ['i', 'flew', 'on', 'air', 'force', 'two', 'for', 'eight', 'years', '.']
PRED 3: l a
PRED SCORE: -0.1203
SENT 4: ['now', 'i', 'have', 'to', 'take', 'off', 'my', 'shoes', 'or', 'boots', 'to', 'get', 'on', 'an', 'airplane', '!']
PRED 4: @ @
PRED SCORE: -0.2087
SENT 5: ['i', "'ll", 'tell', 'you', 'one', 'quick', 'story', 'to', 'illustrate', 'what', 'that', "'s", 'been', 'like', 'for', 'me', '.']
PRED 5: l a
PRED SCORE: -0.2856
I’ve previously obtained OK-ish models using OpenNMT and I’ve also got it working fine with the models available on the website. However, I cannot get it running with the dataset that I have right now. I’d really appreciate your help. Thanks.